agentic-eval
v1.0.0Patterns and techniques for evaluating and improving AI agent outputs. Use this skill when: - Implementing self-critique and reflection loops - Building eval...
⭐ 0· 347·6 current·7 all-time
bysantian@boleyn
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (agentic evaluation, reflection loops, evaluator-optimizer patterns) match the SKILL.md content. No unrelated binaries, env vars, or install steps are requested.
Instruction Scope
SKILL.md stays on-topic (generating, evaluating, critiquing, refining) and uses LLM calls and structured JSON. It also suggests running tests (run_tests) and executing generated code in a loop — this is expected for code-refinement patterns but implies executing generated code and test harnesses, which should be sandboxed and access-controlled.
Install Mechanism
No install spec and no code files — instruction-only. This minimizes on-disk/third-party install risk.
Credentials
Skill requires no environment variables, credentials, or config paths. Nothing disproportionate to the stated purpose is requested.
Persistence & Privilege
always is false and model invocation is allowed (platform default). The skill does not request permanent presence or modify other skills/settings.
Assessment
This skill appears coherent and benign, but it prescribes executing generated code and running tests as part of refinement loops. Before deploying: (1) run generated code and tests inside a sandbox or isolated CI environment to avoid executing untrusted code; (2) enforce iteration limits, convergence checks, and parse/validation of LLM JSON outputs to avoid infinite loops or malformed feedback; (3) treat LLM-evaluations as fallible (LLM-as-judge can hallucinate) and add human review for critical decisions; (4) ensure the agent or environment executing tests has no access to secrets or sensitive systems. If you want to restrict autonomous behavior, consider disabling automatic invocation or requiring human confirmation for execution steps.Like a lobster shell, security has layers — review code before you run it.
latestvk9714j46kdtemrtmw5thams38182j4ja
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
