Skill Eval
Security checks across malware telemetry and agentic risk
Overview
This skill’s evaluation behavior is coherent and disclosed, but it spawns evaluation sessions and keeps histories, so use it with test data and safe prompts.
This skill appears appropriate for evaluating OpenClaw skills. Before installing, plan to run it in a test workspace, use non-production prompts and credentials, and review or delete retained eval histories and output folders after use.
VirusTotal
VirusTotal findings are pending for this skill version.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
An eval can run multiple agent sessions and those sessions may use tools available in the environment.
The skill intentionally launches subagent sessions to run eval tasks. This is core to the evaluation purpose, but eval prompts can cause real tool activity depending on the evaluated skill.
sessions_spawn(task=task, mode="run", cleanup="keep", label="trigger-eval-{id}")Use read-only or test eval prompts, especially for skills connected to production services or accounts.
If the evaluated skill has access to credentials or real services, eval sessions may also be able to use them.
Inherited sandboxing is disclosed and helps the eval access registered skills, but it means spawned sessions may share the user’s local environment and available tool permissions.
`sandbox="inherit"` — subagents must inherit the skill registration environment
Run evaluations in a test workspace or with limited-scope/test credentials where possible.
Evaluation output folders and retained sessions may contain private prompts, tool outputs, paths, or service data.
The skill stores full evaluation records, including tool calls and results, which may include sensitive information from test sessions.
Keep full records — save `full_history.json` (including tool_use + tool_result)
Avoid running evals with secrets or private production data, and clean up retained sessions/output directories when no longer needed.
A malicious or adversarial eval transcript could distort grading results or recommendations.
Raw transcripts are inserted into grader prompts for analysis. This is expected for grading, but transcript text could contain instructions that try to influence the grader.
--- VARIANT A TRANSCRIPT ---\n{variant_a_transcript}\n\n--- VARIANT B TRANSCRIPT ---\n{variant_b_transcript}Treat grader outputs as evaluative evidence, not authority; consider adding explicit instructions that transcript content is untrusted data.
