Install
openclaw skills install critic-agentEvaluates agent outputs for correctness, clarity, completeness, and safety, providing numeric scores and detailed feedback for quality control.
openclaw skills install critic-agentA specialized agent that reviews, critiques, and scores the outputs of other agents using a structured rubric. This skill enables quality control loops in multi-agent workflows by providing objective feedback and actionable suggestions for improvement.
The Critic Agent acts as an independent reviewer that evaluates agent outputs against four dimensions:
The critic generates a numeric score (0-100) and provides detailed, actionable feedback for each dimension. This enables:
openclaw skills run critic-agent \
--task "Write a Python function to parse CSV files" \
--agent-output "def parse_csv(path): ..." \
--context '{"requirements": ["handle edge cases", "include docstring"]}'
When building multi-agent systems, integrate the critic as a validation step:
workflow:
- agent: writer
task: generate_initial_draft
- agent: critic
task: review_output
inputs:
task: "{{writer.task}}"
agentOutput: "{{writer.result}}"
context: "{{writer.context}}"
- if: "critic.score >= 70"
then: deliver
else: retry
{
"type": "object",
"required": ["task", "agentOutput"],
"properties": {
"task": {
"type": "string",
"description": "The original task or prompt given to the agent being reviewed"
},
"agentOutput": {
"type": "string",
"description": "The output to critique (code, text, analysis, etc.)"
},
"context": {
"type": "object",
"description": "Additional context including requirements, constraints, success criteria",
"properties": {
"requirements": {
"type": "array",
"items": { "type": "string" }
},
"successCriteria": {
"type": "array",
"items": { "type": "string" }
},
"constraints": {
"type": "array",
"items": { "type": "string" }
}
}
}
}
}
{
"type": "object",
"required": ["score", "feedback", "overall", "suggestions"],
"properties": {
"score": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Overall weighted score"
},
"feedback": {
"type": "object",
"properties": {
"correctness": {
"type": "string",
"description": "Feedback on factual/technical accuracy (40% weight)"
},
"clarity": {
"type": "string",
"description": "Feedback on readability and organization (25% weight)"
},
"completeness": {
"type": "string",
"description": "Feedback on coverage and thoroughness (25% weight)"
},
"safety": {
"type": "string",
"description": "Feedback on ethical and safety considerations (10% weight)"
}
},
"required": ["correctness", "clarity", "completeness", "safety"]
},
"overall": {
"type": "string",
"description": "Summarized overall assessment (1-2 sentences)"
},
"suggestions": {
"type": "array",
"items": { "type": "string" },
"description": "Actionable improvement suggestions"
},
"dimensionScores": {
"type": "object",
"properties": {
"correctness": { "type": "number", "minimum": 0, "maximum": 100 },
"clarity": { "type": "number", "minimum": 0, "maximum": 100 },
"completeness": { "type": "number", "minimum": 0, "maximum": 100 },
"safety": { "type": "number", "minimum": 0, "maximum": 100 }
}
}
}
}
Overall = (Correctness × 0.40) + (Clarity × 0.25) + (Completeness × 0.25) + (Safety × 0.10)
Correctness (40%)
Clarity (25%)
Completeness (25%)
Safety (10%)
When invoking the critic, you can override defaults:
openclaw skills run critic-agent \
--config '{"model": "openrouter/anthropic/claude-3.5-sonnet", "thresholds": {"pass": 80}}'
| Option | Type | Default | Description |
|---|---|---|---|
model | string | configured default | Model to use for critique |
thresholds.pass | number | 70 | Minimum score to pass validation |
thresholds.needsRevision | number | 50 | Minimum score to avoid auto-retry |
autoRetry | boolean | false | Automatically trigger retry if below threshold |
maxRetries | number | 3 | Maximum retry attempts when autoRetry enabled |
The critic agent receives a system prompt that defines its evaluation framework:
You are a Critic Agent responsible for evaluating the quality of outputs from other AI agents.
Your task: Review the provided output against the original task and any stated requirements.
Evaluation Dimensions:
1. Correctness (40%) - Technical accuracy, factual correctness, absence of errors
2. Clarity (25%) - Readability, logical structure, effective communication
3. Completeness (25%) - Coverage of requirements, edge cases, thoroughness
4. Safety (10%) - Ethical compliance, bias awareness, security considerations
For each dimension, provide:
- A score from 0-100
- Specific feedback explaining the score
- Concrete suggestions for improvement
Calculate the overall score: (correctness * 0.40) + (clarity * 0.25) + (completeness * 0.25) + (safety * 0.10)
Respond in exact JSON format:
{
"score": 85,
"feedback": {
"correctness": "The implementation correctly handles the basic case but misses edge case X...",
"clarity": "Well-structured but variable names could be more descriptive...",
"completeness": "Covers requirements A and B but ignores requirement C...",
"safety": "No safety concerns identified..."
},
"overall": "Good effort with one critical edge case to fix.",
"suggestions": ["Add input validation for empty strings", "Include error handling"]
}
Customize the critic based on the output type:
For code reviews:
Focus on: correctness, edge cases, error handling, code quality, security vulnerabilities.
For written content:
Focus on: argument coherence, evidence support, audience appropriateness, factual claims.
For data analysis:
Focus on: methodology soundness, statistical validity, conclusion support, bias detection.
scripts/critic.js - Main critic implementation that:
scripts/score-helper.js - Utility for computing final score and thresholds
references/patterns.md - Usage patterns and examplesreferences/configuration.md - Full configuration referenceconst result = await skill.run('critic-agent', {
task: originalPrompt,
agentOutput: generatedContent,
context: { requirements: [...], constraints: [...] }
});
if (result.score >= 70) {
console.log('Passed:', result.overall);
} else {
console.log('Needs work:', result.suggestions);
}
async function generateWithQualityGate(task, maxAttempts = 3) {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const output = await generateAgentResponse(task);
const critique = await skill.run('critic-agent', {
task,
agentOutput: output,
context: {}
});
if (critique.score >= 70) {
return { output, critique };
}
if (attempt === maxAttempts) break;
// Incorporate feedback in next attempt
task = `${task}\n\nPrevious feedback: ${critique.suggestions.join('; ')}`;
}
throw new Error('Failed to meet quality threshold after retries');
}
Writer Agent → [Output] → Critic Agent → [Score + Feedback]
↓
If score < threshold
↓
Reject + Send feedback to Writer
↓
Writer revises and resubmits
If the critic agent fails (model unavailable, timeout, malformed response), the behavior depends on configuration:
| Scenario | Default Behavior | Config Override |
|---|---|---|
| LLM API failure | Pass through original output with warning log | onCriticError: "fail" to reject |
| Invalid JSON response | Use heuristic fallback scoring (simple keyword checks) | onCriticError: "reject" |
| Timeout | Treat as score = 0 (fail) | onCriticError: "pass" to auto-pass |
| Model not found | Fallback to configured default model | N/A (auto-handled) |
Configure fallback:
{
"onCriticError": "pass" | "fail" | "reject",
"fallbackModel": "openrouter/default-fallback"
}
Critic returns low scores on everything
Scores inconsistent across runs
Critic hangs or times out