Improvement Learner
AdvisoryAudited by VirusTotal on Apr 12, 2026.
Overview
Type: OpenClaw Skill Name: improvement-learner Version: 1.2.0 The improvement-learner skill bundle is a utility designed to evaluate and enhance the quality of other OpenClaw skills through a self-improvement loop. It uses subprocess calls to run tests (pytest) and an LLM-as-judge (via the claude CLI) to score skill documentation, and it modifies files to apply structural improvements or redact secrets. While it possesses broad file-system and execution capabilities, its logic in scripts/self_improve.py is strictly aligned with its stated purpose and includes proactive security features, such as detecting hardcoded credentials and internal project leakage.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running the self-improvement workflow may change files in the skill path you provide.
The skill is designed to alter a selected skill as part of its improvement loop. This is purpose-aligned and disclosed, but it is still mutation authority over local skill files.
Real Karpathy self-improvement loop: evaluate modify re-evaluate keep/revert repeat.
Run it only on the intended skill directory, keep backups or version control, and review generated diffs before relying on the result.
A malicious or adversarial skill being evaluated could skew its quality score, which may affect improvement decisions.
The evaluated SKILL.md is embedded directly into the LLM judge prompt. If the evaluated skill contains prompt-injection text, it could try to manipulate the judge's scoring response.
SKILL.md content:\n---\n{skill_content}\n---Treat LLM-judge scores as advisory for untrusted skills and use mock/static checks or human review for high-stakes evaluations.
Skill contents may be sent through the user's configured Claude environment and may consume tokens.
In default LLM-judge mode, the script sends a prompt containing up to 8000 characters of SKILL.md content to the configured Claude CLI/provider.
result = subprocess.run(["claude", "-p", "--output-format", "json"], input=prompt, capture_output=True, text=True, timeout=120)
Use --mock when you do not want an LLM call, and avoid evaluating files that contain secrets or private material unless your provider settings allow it.
Evaluation context and patterns can persist across runs and influence future recommendations.
The skill persists improvement outcomes and context into HOT/WARM memory JSON files for later reuse.
entry = {"type": improvement_type, "succeeded": succeeded, "context": context, "timestamp": utc_now_iso(), "hit_count": 1}Store memory in a known project-local directory, periodically inspect or clear it, and avoid including sensitive context in evaluation inputs.
The script may not run, or may depend on local library code not included in this review context.
The script imports lib.common and lib.pareto from a repository root outside the listed skill files, while the registry install section declares no install spec.
_REPO_ROOT = Path(__file__).resolve().parents[3]
if str(_REPO_ROOT) not in sys.path:
sys.path.insert(0, str(_REPO_ROOT))
from lib.common import read_json, write_json, utc_now_iso
from lib.pareto import ParetoFront, ParetoEntryVerify the expected lib.common and lib.pareto modules are present and trusted in the runtime environment before running the scripts.
