Improvement Learner

PassAudited by ClawScan on May 10, 2026.

Overview

This skill appears purpose-aligned, but it can change selected skill files, call Claude to judge content, and save evaluation memory, so users should review paths and privacy settings.

Install/use appears reasonable if you want automated skill evaluation and improvement. Run it only on intended skill directories, prefer version control, use --mock if you do not want Claude calls, and keep its memory/output directories somewhere you can inspect or delete.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

NoteHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

Running the self-improvement workflow may change files in the skill path you provide.

Why it was flagged

The skill is designed to alter a selected skill as part of its improvement loop. This is purpose-aligned and disclosed, but it is still mutation authority over local skill files.

Skill content

Real Karpathy self-improvement loop: evaluate  modify  re-evaluate  keep/revert  repeat.

Recommendation

Run it only on the intended skill directory, keep backups or version control, and review generated diffs before relying on the result.

NoteHigh Confidence

ASI01: Agent Goal Hijack

What this means

A malicious or adversarial skill being evaluated could skew its quality score, which may affect improvement decisions.

Why it was flagged

The evaluated SKILL.md is embedded directly into the LLM judge prompt. If the evaluated skill contains prompt-injection text, it could try to manipulate the judge's scoring response.

Skill content

SKILL.md content:\n---\n{skill_content}\n---

Recommendation

Treat LLM-judge scores as advisory for untrusted skills and use mock/static checks or human review for high-stakes evaluations.

NoteHigh Confidence

ASI07: Insecure Inter-Agent Communication

What this means

Skill contents may be sent through the user's configured Claude environment and may consume tokens.

Why it was flagged

In default LLM-judge mode, the script sends a prompt containing up to 8000 characters of SKILL.md content to the configured Claude CLI/provider.

Skill content

result = subprocess.run(["claude", "-p", "--output-format", "json"], input=prompt, capture_output=True, text=True, timeout=120)

Recommendation

Use --mock when you do not want an LLM call, and avoid evaluating files that contain secrets or private material unless your provider settings allow it.

NoteHigh Confidence

ASI06: Memory and Context Poisoning

What this means

Evaluation context and patterns can persist across runs and influence future recommendations.

Why it was flagged

The skill persists improvement outcomes and context into HOT/WARM memory JSON files for later reuse.

Skill content

entry = {"type": improvement_type, "succeeded": succeeded, "context": context, "timestamp": utc_now_iso(), "hit_count": 1}

Recommendation

Store memory in a known project-local directory, periodically inspect or clear it, and avoid including sensitive context in evaluation inputs.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

The script may not run, or may depend on local library code not included in this review context.

Why it was flagged

The script imports lib.common and lib.pareto from a repository root outside the listed skill files, while the registry install section declares no install spec.

Skill content

_REPO_ROOT = Path(__file__).resolve().parents[3]
if str(_REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(_REPO_ROOT))
from lib.common import read_json, write_json, utc_now_iso
from lib.pareto import ParetoFront, ParetoEntry

Recommendation

Verify the expected lib.common and lib.pareto modules are present and trusted in the runtime environment before running the scripts.