Multi-Model Response Comparator
Compare responses from multiple AI models for the same task and summarize differences in quality, style, speed, and likely cost. Best for model selection, ev...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 24 · 0 current installs · 0 all-time installs
by@xujfcn
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
The name/description (compare multiple models) matches the SKILL.md, rubric, example prompts, and eval scenarios. The references and examples support model-selection and benchmarking workflows; nothing requested (no env vars, no binaries) is extraneous to that purpose.
Instruction Scope
Runtime instructions are scoped to running identical prompts across 2–4 models, scoring tradeoffs, and producing a structured comparison. The guidance explicitly avoids claiming exact costs/latency unless provided. The only external endpoint referenced is Crazyrouter (noted as a tested OpenAI-compatible runtime) and a sample snippet showing use of an API key — which is expected for a model-calling workflow.
Install Mechanism
No install spec or code to download/execute is present; this is an instruction-only skill, which minimizes filesystem and supply-chain risk.
Credentials
The skill declares no required environment variables or credentials. The SKILL.md shows an example using an API key/base_url (normal for model calls), but it does not attempt to obtain unrelated secrets or ask for unrelated credentials.
Persistence & Privilege
The skill is not always-enabled and does not request system-wide changes or modify other skills. Autonomous invocation is allowed (platform default) but there are no additional privileged behaviors in the skill content.
Assessment
This skill is an instruction-only rubric for comparing model outputs and appears internally consistent. Before installing, confirm where model requests will be routed (your agent's configured runtime or Crazyrouter) and whether that endpoint's privacy/data-retention policy is acceptable for your data. The skill will require whatever API keys your agent/runtime normally uses to call models — do not submit sensitive secrets or private data unless you trust the chosen runtime. Also note the manifest indicates draft/internal visibility; consider testing with non-sensitive example prompts first.Like a lobster shell, security has layers — review code before you run it.
Current versionv0.2.0
Download ziplatestpilot
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Multi-Model Response Comparator
Compare answers from multiple AI models for the same prompt, then summarize tradeoffs across quality, style, and likely use cases.
When to use
- choosing between models for a workflow
- benchmarking prompt behavior
- checking whether a stronger model is worth the cost
- generating second opinions on important outputs
Recommended runtime
This skill works with OpenAI-compatible runtimes and has been tested on Crazyrouter.
Required output format
Always structure the final comparison with these sections:
- Task summary
- Models compared
- Strengths by model
- Weaknesses by model
- Best model by use case
- Cost/latency sensitivity note
- Final recommendation
Suggested workflow
- pick 2-4 models
- run the same prompt on each model
- compare structure, depth, correctness, tone, and likely latency/cost
- score or describe tradeoffs using the comparison rubric
- produce a recommendation by use case, not just one universal winner
Comparison rules
- Use the same prompt and same success criteria for all models.
- Do not claim exact cost or latency unless the user provides them.
- If metrics are inferred, label them as likely or expected.
- Separate writing quality from factual reliability.
- For coding tasks, prioritize correctness, edge cases, and implementation completeness.
Example prompts
- Compare GPT, Claude, and Gemini on this support email draft.
- Run this coding prompt across three models and summarize which one is most production-ready.
- Compare low-cost vs premium models for a blog outline task.
References
Read these when preparing the final comparison:
references/comparison-rubric.mdreferences/example-prompts.md
Crazyrouter example
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://crazyrouter.com/v1"
)
Recommended artifacts
- catalog.json
- provenance.json
- market-manifest.json
- evals/evals.json
Files
8 totalSelect a file
Select a file to preview.
Comments
Loading comments…
