Install
openclaw skills install promptfooWork with Promptfoo for local, repeatable LLM evals and red-team testing. Use when a request explicitly involves Promptfoo, `promptfooconfig.yaml`, Promptfoo CLI commands (`promptfoo eval`, `validate`, `view`, `redteam`, `generate`, `mcp`), Promptfoo examples, assertions/metrics, provider comparisons, RAG evals, agent evals, or converting an LLM app/API workflow into Promptfoo-based quality or security tests. Do not trigger for generic prompt-writing alone unless the task specifically needs Promptfoo.
openclaw skills install promptfooUse Promptfoo when the task is specifically about Promptfoo-based evals, regression suites, or red-team scans.
Trigger this skill for:
promptfooconfig.yaml creation or editingDo not trigger this skill for:
Promptfoo examples consistently use these patterns:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
descriptionprompts, providers and/or targetstests with varsdefaultTest.assertfile://... references for prompts, docs, agents, tools, or local filesUse Promptfoo's idioms directly instead of inventing a custom layout.
Choose the workflow that matches the request:
promptfoo init or the helper scripts in scripts/promptfooconfig.yaml, then run promptfoo validatepromptfoo evalpromptfoo list, promptfoo show, promptfoo logs, promptfoo viewpromptfoo redteam ...promptfoo generate ...promptfoo mcppromptfooconfig.yamlPrefer small, representative suites before big fan-out runs.
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Getting started
prompts:
- 'Convert the following English text to {{language}}: {{input}}'
providers:
- openai:gpt-5.2
- openai:gpt-5-mini
tests:
- vars:
language: French
input: Hello world
assert:
- type: contains
value: 'Bonjour le monde'
defaultTest# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Automatic response evaluation using LLM rubric scoring
prompts:
- file://prompts.txt
providers:
- openai:chat:gpt-5.2
defaultTest:
assert:
- type: llm-rubric
value: Do not mention that you are an AI or chat assistant
tests:
- vars:
name: Bob
question: Can you help me find a specific product on your website?
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Evaluating RAG responses using multiple quality metrics
prompts:
- |
You are an internal corporate chatbot.
Respond to this query: {{query}}
Here is some context that you can use to write your response: {{context}}
providers:
- openai:gpt-4.1-mini
tests:
- vars:
query: What is the max purchase that doesn't require approval?
context: file://docs/reimbursement.md
assert:
- type: contains
value: '$500'
- type: factuality
value: the employee's manager is responsible for approvals
- type: answer-relevance
threshold: 0.9
- type: context-faithfulness
threshold: 0.9
Promptfoo supports richer provider objects for agent systems:
providers:
- id: openai:agents:my-agent
config:
agent: file://./agents/my-agent.ts
tools: file://./tools/my-tools.ts
maxTurns: 20
Use provider objects like this when evaluating agent workflows instead of flattening everything into plain prompt strings.
Use the simplest assertion that reliably captures the requirement.
Prefer, roughly in this order:
contains, icontains, regex, contains-any, is-json, latency, costllm-rubric, factuality, answer-relevance, context-faithfulness, context-recall, context-relevancejavascript or python assertions when built-ins are insufficientassert-set when you need grouped pass criteriaGuidance:
contains-any or contains-all when wording can vary but key content must appearjavascript for custom logic only when a built-in metric does not fitIf Promptfoo already has a relevant example, use it as the base:
npx promptfoo@latest init --example getting-started
npx promptfoo@latest init --example eval-rag
npx promptfoo@latest init --example compare-openai-models
If the task matches one of the helper scripts in scripts/, use the script to scaffold a repo-aligned starter config quickly.
npx promptfoo@latest validate
If the config is not in the current directory, pass the correct path.
npx promptfoo@latest eval
npx promptfoo@latest eval --filter-first-n 5
npx promptfoo@latest eval --filter-pattern "refund|billing"
npx promptfoo@latest eval --filter-providers "openai|anthropic"
npx promptfoo@latest eval --max-concurrency 4
Use filters for large or expensive suites.
npx promptfoo@latest view
npx promptfoo@latest list evals
npx promptfoo@latest show <eval-id>
npx promptfoo@latest logs
Focus on:
For RAG:
file://... documents when the source material is localfactuality, answer-relevance, context-recall, context-relevance, and context-faithfulness where appropriateFor agents:
Use this when the goal is jailbreak testing, prompt injection resistance, policy enforcement, data exposure checks, or broader AI vulnerability scanning.
npx promptfoo@latest redteam setup
npx promptfoo@latest redteam run
npx promptfoo@latest redteam report
npx promptfoo@latest redteam init --no-gui
For red-team configs, pay special attention to:
purpose - what the system does, who the legitimate user is, and what must stay protectedtargets - HTTP endpoint, model provider, custom script, or direct app integrationUse stable target labels over time so reports stay comparable.
After a scan, summarize:
Do not leave red-team results as a one-off report. Convert serious failures into permanent eval coverage.
npx promptfoo@latest init
npx promptfoo@latest init --example getting-started
npx promptfoo@latest init --example eval-rag
npx promptfoo@latest init --example compare-openai-models
npx promptfoo@latest validate
npx promptfoo@latest eval
npx promptfoo@latest eval --resume
npx promptfoo@latest eval --retry-errors
npx promptfoo@latest list evals
npx promptfoo@latest show <eval-id>
npx promptfoo@latest logs
npx promptfoo@latest view
npx promptfoo@latest generate dataset
npx promptfoo@latest generate assertions
npx promptfoo@latest redteam setup
npx promptfoo@latest redteam run
npx promptfoo@latest redteam report
npx promptfoo@latest mcp --transport stdio
Promptfoo can fan out quickly. Keep it under control.
--retry-errors for transient failures--resume for interrupted longer runsIf an eval behaves strangely:
promptfoo validatepromptfoo logsvarstargets instead of a toy prompt stringCommon failure classes:
Use these helper scripts when they fit:
scripts/scaffold-promptfoo-config.py - generate repo-aligned starter configs for common modesscripts/promptfoo-preflight.py - inspect a Promptfoo workspace and suggest the next commandRead these as needed:
references/config-patterns.md for repo-aligned templates and selection heuristicsreferences/example-notes.md for patterns observed in Promptfoo's own examples