skill-evaluation

ReviewAudited by ClawScan on May 14, 2026.

Overview

The visible package is a coherent skill-testing toolkit, with the main caution that evaluations and trigger probes should be run in a sandbox because they can invoke local AI tools and record outputs.

This skill appears safe to use for evaluating other skills, provided you follow its own sandbox-first guidance. Use disposable workspaces, mock external dependencies, avoid real credentials or private data, and run optional helper scripts only when you intend to test trigger behavior with your local AI platform tools.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If a user evaluates an untrusted skill outside a sandbox, that target skill could modify files, call APIs, or perform browser actions.

Why it was flagged

The skill is designed to execute or observe target skills that may use mutating tools, but it explicitly sets a sandbox and approval boundary.

Skill content
Enable approval mode — require human confirmation for all mutating tool calls (file writes, API calls, browser actions, shell commands) ... these MUST be sandboxed or mocked.
Recommendation

Run evaluations only in disposable workspaces with test data/accounts, keep approvals enabled, and mock external systems.

What this means

Running the trigger evaluator can spend local AI-provider quota and executes the local platform tool in the current project context.

Why it was flagged

The optional trigger evaluator launches a local AI platform CLI subprocess to test whether a skill triggers. It uses argument lists rather than shell execution, and this is aligned with the trigger-evaluation purpose.

Skill content
cmd = ["claude", "-p", query, "--output-format", "stream-json", ...] ... process = subprocess.Popen(cmd, ... cwd=str(project_root), env=env)
Recommendation

Only run the trigger evaluator intentionally, preferably in a test project, and verify the local CLI/account being used.

What this means

The local platform CLI may run under the user's existing account/session when trigger probes are executed.

Why it was flagged

The helper subprocess inherits the user's environment, which may include platform credentials or configuration expected by local AI CLIs. The artifacts do not show logging or exfiltration of those values.

Skill content
env = {k: v for k, v in os.environ.items() if k != "CLAUDECODE"} ... subprocess.Popen(..., env=env)
Recommendation

Use a test account or limited environment for evaluations of untrusted prompts or skills, and avoid exposing unnecessary environment secrets.

What this means

Generated reports may contain target-skill outputs, tool observations, or sample data that should not be shared if sensitive.

Why it was flagged

The evaluation schema stores actual outputs and case details in run artifacts, which is expected for reporting but may retain sensitive data if real inputs are used.

Skill content
## runs/run-{date}-v{N}/results.json ... "actual": "GET /api/users/123 -> {name, email, role, avatar}"
Recommendation

Use mock data for tests and review generated JSON/HTML reports before sharing or committing them.