Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Critic Agent

v1.0.0

Evaluates agent outputs for correctness, clarity, completeness, and safety, providing numeric scores and detailed feedback for quality control.

0· 270·2 current·2 all-time
byTico-Wang@wang-erqian

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for wang-erqian/critic-agent.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Critic Agent" (wang-erqian/critic-agent) from ClawHub.
Skill page: https://clawhub.ai/wang-erqian/critic-agent
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install wang-erqian/critic-agent

ClawHub CLI

Package manager switcher

npx clawhub@latest install critic-agent
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name, description, SKILL.md, system prompt, and helper script all align: this is a critic that scores outputs along the documented dimensions and returns JSON. The compute-score helper and prompt are appropriate and proportional to the stated purpose.
Instruction Scope
The runtime instructions stay within the critic's scope (evaluate given output and return JSON). The SKILL.md and prompt do not request unrelated files or credentials. However the test script spawns a local agent using openclaw --local and expects an LLM provider; that runtime behavior requires care (it may cause network calls to LLMs when executed).
Install Mechanism
There is no install spec and no downloads; the skill is instruction-first with two small local scripts (Node script and shell test). Nothing writes arbitrary external code or pulls remote archives during install.
!
Credentials
The registry metadata lists no required env vars, but the documentation and test script reference OPENROUTER_API_KEY, CRITIC_DEFAULT_MODEL, and CRITIC_LOG_PATH. Requesting an LLM API key is reasonable for this skill, but the skill should declare these as required/optional to avoid surprises. Also ensure the log path is appropriate for sensitive content since critiques may include parts of agent outputs.
Persistence & Privilege
always:false and no install-time modifications are requested. The skill suggests logging critiques to memory/critic-log.json (expected behavior). It does not request elevated or system-wide privileges.
What to consider before installing
This skill appears to do what it says, but review a few things before installing or running it: - The docs and test script reference OPENROUTER_API_KEY and other env vars even though the registry shows none declared; if you run the test or use --local you may need to provide an LLM API key. Treat any API key as sensitive. - Inspect and edit scripts/test-critic.sh before running: it uses an absolute path (/home/weq/...) to load the prompt and to call compute-score.js, which will fail or behave unexpectedly on your system and may leak local path assumptions. Replace hard-coded paths with relative or environment-driven paths. - Decide where critique logs (CRITIC_LOG_PATH / memory/critic-log.json) will be written. Critiques may include full agent outputs, which could contain secrets; ensure the log location is secure and consider redaction or limited retention. - The skill invokes openclaw agent --local, which will call your configured model provider; confirm model, timeout, and cost settings before running at scale. If you plan to use this skill in production: ask the publisher to (1) declare optional/required env vars in the registry metadata, (2) remove hard-coded local paths from test scripts, and (3) document what gets logged and recommended log retention/permissions.

Like a lobster shell, security has layers — review code before you run it.

latestvk97f5b301hxzw4fe3e3x5hegcn82tnaj
270downloads
0stars
1versions
Updated 14h ago
v1.0.0
MIT-0

Critic Agent Skill

A specialized agent that reviews, critiques, and scores the outputs of other agents using a structured rubric. This skill enables quality control loops in multi-agent workflows by providing objective feedback and actionable suggestions for improvement.

Description

The Critic Agent acts as an independent reviewer that evaluates agent outputs against four dimensions:

  • Correctness (40%): Factual accuracy, logical soundness, absence of errors
  • Clarity (25%): Readability, organization, communication effectiveness
  • Completeness (25%): Coverage of requirements, edge cases, thoroughness
  • Safety (10%): Ethical considerations, potential harms, compliance with guidelines

The critic generates a numeric score (0-100) and provides detailed, actionable feedback for each dimension. This enables:

  • Quality gates before final delivery
  • Iterative improvement loops
  • Consistent evaluation standards across team outputs

Usage

Basic Invocation

openclaw skills run critic-agent \
  --task "Write a Python function to parse CSV files" \
  --agent-output "def parse_csv(path): ..." \
  --context '{"requirements": ["handle edge cases", "include docstring"]}'

In Agent Workflow

When building multi-agent systems, integrate the critic as a validation step:

workflow:
  - agent: writer
    task: generate_initial_draft
  - agent: critic
    task: review_output
    inputs:
      task: "{{writer.task}}"
      agentOutput: "{{writer.result}}"
      context: "{{writer.context}}"
  - if: "critic.score >= 70"
    then: deliver
    else: retry

Input Schema

{
  "type": "object",
  "required": ["task", "agentOutput"],
  "properties": {
    "task": {
      "type": "string",
      "description": "The original task or prompt given to the agent being reviewed"
    },
    "agentOutput": {
      "type": "string",
      "description": "The output to critique (code, text, analysis, etc.)"
    },
    "context": {
      "type": "object",
      "description": "Additional context including requirements, constraints, success criteria",
      "properties": {
        "requirements": {
          "type": "array",
          "items": { "type": "string" }
        },
        "successCriteria": {
          "type": "array",
          "items": { "type": "string" }
        },
        "constraints": {
          "type": "array",
          "items": { "type": "string" }
        }
      }
    }
  }
}

Output Schema

{
  "type": "object",
  "required": ["score", "feedback", "overall", "suggestions"],
  "properties": {
    "score": {
      "type": "number",
      "minimum": 0,
      "maximum": 100,
      "description": "Overall weighted score"
    },
    "feedback": {
      "type": "object",
      "properties": {
        "correctness": {
          "type": "string",
          "description": "Feedback on factual/technical accuracy (40% weight)"
        },
        "clarity": {
          "type": "string",
          "description": "Feedback on readability and organization (25% weight)"
        },
        "completeness": {
          "type": "string",
          "description": "Feedback on coverage and thoroughness (25% weight)"
        },
        "safety": {
          "type": "string",
          "description": "Feedback on ethical and safety considerations (10% weight)"
        }
      },
      "required": ["correctness", "clarity", "completeness", "safety"]
    },
    "overall": {
      "type": "string",
      "description": "Summarized overall assessment (1-2 sentences)"
    },
    "suggestions": {
      "type": "array",
      "items": { "type": "string" },
      "description": "Actionable improvement suggestions"
    },
    "dimensionScores": {
      "type": "object",
      "properties": {
        "correctness": { "type": "number", "minimum": 0, "maximum": 100 },
        "clarity": { "type": "number", "minimum": 0, "maximum": 100 },
        "completeness": { "type": "number", "minimum": 0, "maximum": 100 },
        "safety": { "type": "number", "minimum": 0, "maximum": 100 }
      }
    }
  }
}

Scoring Rubric

Overall Score Calculation

Overall = (Correctness × 0.40) + (Clarity × 0.25) + (Completeness × 0.25) + (Safety × 0.10)

Dimension Definitions

Correctness (40%)

  • Does the output contain factual errors?
  • Is the logic/implementation sound?
  • Are technical claims accurate?
  • Do code examples actually work?
  • Are sources/references reliable?

Clarity (25%)

  • Is the language clear and unambiguous?
  • Is the structure logical and easy to follow?
  • Are key points emphasized appropriately?
  • Is formatting used effectively?
  • Would the intended audience understand it?

Completeness (25%)

  • Are all requirements addressed?
  • Are edge cases considered?
  • Are necessary details provided?
  • Is there missing context that should be included?
  • Does it cover the scope fully?

Safety (10%)

  • Does it promote harmful behavior?
  • Are biases acknowledged and mitigated?
  • Does it comply with ethical guidelines?
  • Could it be misused maliciously?
  • Are security/privacy concerns addressed?

Score Thresholds

  • 80-100: Excellent - ready for delivery
  • 70-79: Good - minor revisions suggested
  • 50-69: Needs Revision - significant issues to address
  • 0-49: Fail - major problems, reject and redo

Configuration

When invoking the critic, you can override defaults:

openclaw skills run critic-agent \
  --config '{"model": "openrouter/anthropic/claude-3.5-sonnet", "thresholds": {"pass": 80}}'

Configuration Options

OptionTypeDefaultDescription
modelstringconfigured defaultModel to use for critique
thresholds.passnumber70Minimum score to pass validation
thresholds.needsRevisionnumber50Minimum score to avoid auto-retry
autoRetrybooleanfalseAutomatically trigger retry if below threshold
maxRetriesnumber3Maximum retry attempts when autoRetry enabled

Example Prompts for Critic Agent

Prompt Template

The critic agent receives a system prompt that defines its evaluation framework:

You are a Critic Agent responsible for evaluating the quality of outputs from other AI agents.

Your task: Review the provided output against the original task and any stated requirements.

Evaluation Dimensions:
1. Correctness (40%) - Technical accuracy, factual correctness, absence of errors
2. Clarity (25%) - Readability, logical structure, effective communication
3. Completeness (25%) - Coverage of requirements, edge cases, thoroughness
4. Safety (10%) - Ethical compliance, bias awareness, security considerations

For each dimension, provide:
- A score from 0-100
- Specific feedback explaining the score
- Concrete suggestions for improvement

Calculate the overall score: (correctness * 0.40) + (clarity * 0.25) + (completeness * 0.25) + (safety * 0.10)

Respond in exact JSON format:
{
  "score": 85,
  "feedback": {
    "correctness": "The implementation correctly handles the basic case but misses edge case X...",
    "clarity": "Well-structured but variable names could be more descriptive...",
    "completeness": "Covers requirements A and B but ignores requirement C...",
    "safety": "No safety concerns identified..."
  },
  "overall": "Good effort with one critical edge case to fix.",
  "suggestions": ["Add input validation for empty strings", "Include error handling"]
}

Task-Specific Prompts

Customize the critic based on the output type:

For code reviews:

Focus on: correctness, edge cases, error handling, code quality, security vulnerabilities.

For written content:

Focus on: argument coherence, evidence support, audience appropriateness, factual claims.

For data analysis:

Focus on: methodology soundness, statistical validity, conclusion support, bias detection.

Implementation

Scripts

scripts/critic.js - Main critic implementation that:

  • Accepts input JSON via stdin
  • Constructs appropriate critique prompt
  • Calls LLM with structured output enforcement
  • Validates and normalizes output
  • Returns JSON result

scripts/score-helper.js - Utility for computing final score and thresholds

References

  • references/patterns.md - Usage patterns and examples
  • references/configuration.md - Full configuration reference

Integration Guide

Single Critic Call

const result = await skill.run('critic-agent', {
  task: originalPrompt,
  agentOutput: generatedContent,
  context: { requirements: [...], constraints: [...] }
});

if (result.score >= 70) {
  console.log('Passed:', result.overall);
} else {
  console.log('Needs work:', result.suggestions);
}

Retry Loop with Critic

async function generateWithQualityGate(task, maxAttempts = 3) {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const output = await generateAgentResponse(task);

    const critique = await skill.run('critic-agent', {
      task,
      agentOutput: output,
      context: {}
    });

    if (critique.score >= 70) {
      return { output, critique };
    }

    if (attempt === maxAttempts) break;

    // Incorporate feedback in next attempt
    task = `${task}\n\nPrevious feedback: ${critique.suggestions.join('; ')}`;
  }

  throw new Error('Failed to meet quality threshold after retries');
}

Parallel Critic Multi-Agent Workflow

Writer Agent → [Output] → Critic Agent → [Score + Feedback]
                                       ↓
                                   If score < threshold
                                       ↓
                              Reject + Send feedback to Writer
                                       ↓
                              Writer revises and resubmits

Fallback Behavior

If the critic agent fails (model unavailable, timeout, malformed response), the behavior depends on configuration:

ScenarioDefault BehaviorConfig Override
LLM API failurePass through original output with warning logonCriticError: "fail" to reject
Invalid JSON responseUse heuristic fallback scoring (simple keyword checks)onCriticError: "reject"
TimeoutTreat as score = 0 (fail)onCriticError: "pass" to auto-pass
Model not foundFallback to configured default modelN/A (auto-handled)

Configure fallback:

{
  "onCriticError": "pass" | "fail" | "reject",
  "fallbackModel": "openrouter/default-fallback"
}

Limitations

  • Critique quality depends on the underlying LLM's capability
  • Subjective dimensions (clarity) may vary between runs
  • Not suitable for real-time or streaming evaluation (requires complete output)
  • Cannot guarantee perfect detection of all safety issues
  • Scoring is indicative, not absolute truth

Best Practices

  1. Use as advisory: Critic suggestions should inform but not replace human judgment
  2. Calibrate thresholds: Adjust pass thresholds based on your quality requirements
  3. Review borderline cases: Scores 65-75 deserve human spot-check
  4. Log all critiques: Record feedback for continuous improvement
  5. Iterate on prompts: Customize critic prompts for your specific domain
  6. Combine multiple critics: For high-stakes outputs, use 2-3 different critic models

Future Enhancements

  • Multi-critic consensus (aggregate scores from multiple models)
  • Domain-specific rubrics (customize weights per task type)
  • Historical learning (store critiques to identify recurring issues)
  • Interactive critique (allow back-and-forth between writer and critic)
  • Automated remediation (auto-apply simple fixes from suggestions)

Troubleshooting

Critic returns low scores on everything

  • Check if requirements are clearly stated in context
  • Verify prompt template matches your output type
  • Try a more capable model

Scores inconsistent across runs

  • Temperature may be too high; set to 0 for deterministic evaluation
  • Add specific examples in the system prompt to anchor scoring

Critic hangs or times out

  • Increase timeout setting
  • Simplify lengthy outputs (critique focuses on key sections)
  • Use smaller, faster model for feedback generation

Comments

Loading comments...