Skill Garden

v1.0.0

Automatically improves installed skills through passive usage observation and periodic batch analysis. Activates after any skill is used, or when you say gro...

⭐ 0· 55·0 current·0 all-time

by@yjin94606-art

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for yjin94606-art/skill-garden.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Skill Garden" (yjin94606-art/skill-garden) from ClawHub.
Skill page: https://clawhub.ai/yjin94606-art/skill-garden
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install yjin94606-art/skill-garden

ClawHub CLI

Package manager switcher

npx clawhub@latest install skill-garden

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

Skill Garden's name and description match its implementation: it reads per-skill usage logs, evaluates skills, generates proposals, and edits SKILL.md files. Requiring read/write access to ~/.openclaw/workspace/skills is coherent with the stated purpose. However, the skill claims to be 'user-in-control' and 'transparent' while also auto-applying changes in multiple confidence bands (see instruction_scope).

Instruction Scope

SKILL.md instructs the agent to passively log every skill invocation and to run weekly batch analysis that can auto-apply edits to other skills' SKILL.md. This gives it the ability to collect broad cross-skill telemetry and to modify other skills without a per-change explicit approval (the decision tree auto-applies changes at ≥90% and even 70–89% as [experimental]). That scope is functionally necessary for an auto-improver, but it is a high-risk capability (silent/automatic edits to other skills, broad read/write of skill directories).

✓

Install Mechanism

There is no network download or external installer; the skill is instruction-only with bundled scripts. That's lower risk than arbitrary remote downloads. Files are intended to be run from the user's workspace and operate on local files only.

ℹ

Credentials

The skill requests no environment variables or external credentials, which is proportionate to its stated function. However, it implicitly requires filesystem access to ~/.openclaw/workspace/skills (read and write), which is not listed in a formal manifest. That filesystem access is necessary for the purpose but is high-privilege because it touches other skills' files and the grower's own memory/logs.

Persistence & Privilege

always:false (good) but the skill is allowed to invoke autonomously and — by design — can modify other skills' SKILL.md and maintain its own cron schedule and memory. Modifying other skills' code/content is central to its purpose but is also a significant privilege; the SKILL.md contains inconsistent statements about when changes 'ask first' vs auto-apply, increasing the risk of unexpected permanent changes.

Scan Findings in Context

[modifies_other_skills] expected: The skill's core function is to edit other skills' SKILL.md; this is consistent with its purpose but is a high-privilege action and should be explicitly consented to by the user.

[auto_apply_policy_vs_documentation_inconsistency] unexpected: The SKILL.md philosophy claims 'uncertain ones ask first', but the decision tree and earlier sections apply changes automatically at 70–89% (tagged experimental) and auto-apply at ≥90%; this inconsistency matters because it affects whether edits require an explicit user approval.

[filesystem_access_implicit] expected: Scripts read and write ~/.openclaw/workspace/skills and per-skill references/usage_log.md. No environment variables required, but implicit filesystem access is necessary and should be considered when granting privileges.

[truncated_or_buggy_script] unexpected: The included batch_analyze.py appears truncated in the provided listing (the final returned dict refers to 'efficienc' and the file is '... [truncated]'). This indicates the distributed code may be incomplete or contain copy/paste truncation bugs which could cause runtime exceptions or unpredictable behavior.

What to consider before installing

Before installing or enabling Skill Garden, consider the following: - This skill needs read/write access to your ~/.openclaw/workspace/skills tree and will read logs and edit other skills' SKILL.md files. That is necessary for its purpose but is a powerful privilege — back up your skills directory first. - SKILL.md promises 'user-in-control' but the decision rules auto-apply edits at high and mid confidence (≥90% and 70–89%). If you want manual approval for all edits, change the min-confidence / auto-apply thresholds or run batch_analyze.py in --dry-run mode only. - The bundled batch_analyze.py listing appears truncated/buggy; review the full scripts in a safe environment before letting them run. Look for file-write locations, how edits are applied, and any notification logic. - Prefer a conservative deployment: run the scripts manually (dry-run) first, inspect proposals in references/improvement_proposals.md, and only allow auto-apply after you validate behavior. - If you must test, limit the skill to a copy of your workspace or exclude high-value/production skills from its scope until you’re comfortable. If you want, I can: (1) point out exact file lines that perform writes, (2) suggest a minimal safe configuration (disable auto-apply, change cron to manual), or (3) produce step-by-step instructions to sandbox and test the skill safely.

Like a lobster shell, security has layers — review code before you run it.

latestvk97dcxtmkbhzzjb7y99mgv77ad85a382

55downloads

0stars

1versions

Updated 5d ago

v1.0.0

MIT-0

🌿 Skill Garden — Skill Evolution Engine

"Every skill should get better the more you use it."

Philosophy

Skill Garden treats skill improvement as a continuous, invisible process — not a special operation. It runs passively in the background, accumulating observations from every skill invocation, then periodically synthesizes them into concrete improvements.

Key design principles:

Token-efficient: Lightweight structured logs, batch processing, no real-time overhead
User-in-control: High-confidence changes auto-apply; uncertain ones ask first
Transparent: Every change is explained, nothing happens silently without explanation
Self-contained: Manages its own memory, dashboard, proposals, and cron schedule

Three-Layer Architecture

Layer 1 — Passive Observation (near-zero token cost)
  Every skill invocation → 1-line structured log entry
  Abnormal outcomes (FAIL/SLOW) → detailed log with evidence

Layer 2 — Weekly Batch Analysis (isolated agent, ~5-15 min)
  Read all accumulated logs
  Run evaluation engine across 6 dimensions
  Generate specific improvement proposals

Layer 3 — Targeted Modification (low frequency, high precision)
  Confidence ≥ 90% → apply immediately, notify user
  Confidence 70–89% → apply with [experimental] tag
  Confidence 50–69% → write to proposals, ask user
  Confidence < 50% → log as observation only

When This Skill Activates

Trigger 1 — After every skill use (automatic, passive): When any skill finishes executing, immediately log the outcome using the format in references/usage_tracker.md. This is the most important layer — it costs almost nothing and feeds everything else.

Trigger 2 — On user request:

"grow this skill" / "improve skill" / "optimize skill"
"why did this fail" / "analyze this skill"
"run skill analysis" / "check my skills"
"skill health" / "skill dashboard"

Trigger 3 — On schedule (automatic, every Sunday 20:00): An isolated agent runs batch_analyze.py and generate_report.py, applies high-confidence improvements, and sends you a summary.

Layer 1: Passive Observation

After any skill finishes (any outcome: OK, FAIL, PARTIAL, SLOW, SKIP), immediately write a structured log. Use log_insight.py or write directly to the skill's references/usage_log.md.

Log Entry Format

For OK outcomes with nothing notable (minimal tokens):

## YYYY-MM-DD HH:MM
Trigger: [trigger in ≤10 words]
Outcome: OK
Signal: [one-line finding or "No issues"]

For PARTIAL, FAIL, SLOW outcomes (always log all fields):

## YYYY-MM-DD HH:MM

### Trigger
[What the user asked for, ≤10 words]

### Outcome
OK | PARTIAL | FAIL | SLOW | SKIP

### Signal
[One specific phrase: what this tells us about the skill]
Examples:
  - "Covered: standard use case works perfectly"
  - "Missing: error handling for network timeouts"
  - "Ambiguous: step 3 could be interpreted two ways"
  - "Outdated: API version in skill doesn't match current"

### Evidence
[1-2 sentences. Quote or paraphrase exact output/error. Be specific.]

### Flags
[Comma-separated tags: [new_trigger] [missing_coverage] [confusing_step]
 [outdated_info] [token_heavy] [edge_case] [user_workaround_used]
 [config_stale] [api_change] [Covered] [success_boost]]

Using log_insight.py

# Quick OK log (minimal)
python3 ~/.openclaw/workspace/skills/skill-garden/scripts/log_insight.py \
  --skill github-trending-summary \
  --trigger "daily top 5 repos" \
  --outcome OK \
  --signal "Covered: standard case"

# Detailed failure log
python3 ~/.openclaw/workspace/skills/skill-garden/scripts/log_insight.py \
  --skill banxuebang-helper \
  --trigger "check homework" \
  --outcome FAIL \
  --signal "Missing: semester selector not dynamic" \
  --evidence "Config hardcoded to 2024-2025 but API shows 2025-2026 is current." \
  --flags "missing_coverage,config_stale" \
  --mark-landmark "SkillImproved"

Rule of thumb: If you had to pause, reconsider, or work around something — log it with full detail. If it just worked perfectly — log minimally. The goal is signal, not noise.

Layer 2: Weekly Batch Analysis

Run manually or wait for the Sunday cron trigger.

Manual Trigger

Say: "run skill analysis" or "grow all skills"

The analysis does the following in order:

Scan all skills — read every references/usage_log.md
Evaluate each skill across 6 dimensions (see references/evaluation_engine.md)
Generate proposals — for each skill with score below threshold
Apply high-confidence changes — auto-edit SKILL.md for confident improvements
Update dashboard — rewrite references/dashboard.md
Notify user — send summary message

Running Scripts Directly

# Full batch analysis (evaluate all skills, generate proposals)
python3 ~/.openclaw/workspace/skills/skill-garden/scripts/batch_analyze.py

# Analyze one skill only
python3 ~/.openclaw/workspace/skills/skill-garden/scripts/batch_analyze.py --skill github-trending-summary

# Dry run (proposals only, don't apply)
python3 ~/.openclaw/workspace/skills/skill-garden/scripts/batch_analyze.py --dry-run --min-confidence 70

# Generate/refresh dashboard
python3 ~/.openclaw/workspace/skills/skill-garden/scripts/generate_report.py

# Output as JSON (for integrations)
python3 ~/.openclaw/workspace/skills/skill-garden/scripts/generate_report.py --output json

Layer 3: Applying Improvements

The Six Evaluation Dimensions

Dimension	Weight	What It Measures
Coverage	30%	Does the skill's description match how it's actually used?
Completeness	25%	Are all necessary steps present? Do FAIL events reveal missing coverage?
Clarity	20%	Are steps unambiguous? Are there [confusing_step] or [user_workaround_used] flags?
Currency	15%	Is the information still accurate? Are there [outdated_info] or [config_stale] flags?
Efficiency	10%	Is it unnecessarily verbose or token-heavy?

See references/evaluation_engine.md for the full evaluation algorithm, scoring thresholds, and confidence calibration guide.

Applying an Edit to SKILL.md

When a proposal meets the confidence threshold:

Read the current SKILL.md
Identify the exact text to replace using edit tool
Write the improved version

Add a brief changelog note at the top of the edit:

<!-- Auto-improved by Skill Garden: YYYY-MM-DD
     Reason: [confidence]% confidence — [evidence summary] -->

Update references/improvement_proposals.md to mark as applied
Notify the user with a summary of what changed

Editing Checklist

Before applying any edit:

Change is specific and testable (not vague advice)
New text is more concrete than old text (examples > statements)
If adding a step, verify it doesn't contradict existing steps
If removing text, verify no other part of the skill depends on it
If changing description, verify all log triggers are now covered
Change addresses the flagged evidence, not just the symptom

Dashboard

The dashboard (references/dashboard.md) shows:

Overall skill ecosystem health
Per-skill scores across all 6 dimensions
Recent signals and flags
Pending proposals
Weekly outcome distribution
Recent landmark events

Regenerate with:

python3 ~/.openclaw/workspace/skills/skill-garden/scripts/generate_report.py

Cron Setup

Set up a weekly Sunday 20:00 analysis run:

openclaw cron add \
  --name "Skill Garden Weekly Analysis" \
  --schedule '{"kind":"cron","expr":"0 20 * * 0","tz":"Asia/Shanghai"}' \
  --sessionTarget isolated \
  --payload '{"kind":"agentTurn","message":"Run full skill analysis: execute batch_analyze.py then generate_report.py and notify me of any improvements applied.","timeoutSeconds":900}'

Or manually trigger with:

grow all skills

Reference Files

File	Purpose
`references/usage_tracker.md`	Full logging schema, log rotation, landmark events
`references/evaluation_engine.md`	Scoring algorithm, thresholds, confidence calibration
`references/improvement_examples.md`	Real improvement examples with full context
`references/dashboard.md`	Auto-generated growth dashboard (don't edit manually)
`references/master_log.md`	Grower's own lightweight master log
`scripts/log_insight.py`	CLI for writing log entries
`scripts/batch_analyze.py`	Core analysis engine
`scripts/generate_report.py`	Dashboard generator

Examples in Context

Example 1: Passive Log — Skill Works Perfectly

User: "给我推荐一首今天听的歌" Skill: daily-song-recommender — succeeds, email sent Your action: Write one-line abbreviated log:

## 2026-04-22 09:15
Trigger: daily song recommendation
Outcome: OK
Signal: Covered: standard use case, no issues

Example 2: Passive Log — Skill Fails Due to Missing Coverage

User: "查一下这周的数学作业" Skill: banxuebang-helper — API error: semester field stale Your action: Write full structured log:

## 2026-04-22 21:30

### Trigger
banxuebang: check math homework

### Outcome
FAIL

### Signal
Missing: semester config hardcoded, not dynamic

### Evidence
API returned {"code": 500, "msg": "semester not found"}.
Config in skill has "2024-2025下学期" but actual is "2025-2026下学期".

### Flags
[missing_coverage] [config_stale]

Example 3: Passive Log — New Trigger Not in Description

User: "爬一下GitHub今日热榜发到我邮箱" Skill: github-trending-summary — works, but user's phrasing not in description Your action: Write log with [new_trigger] flag:

## 2026-04-22 10:00

### Trigger
github-trending-summary: crawl and email

### Outcome
OK

### Signal
Covered: but user said "爬" not in description triggers

### Evidence
Skill handled it fine, but description doesn't mention "爬" as a trigger phrase.

### Flags
[new_trigger]

Example 4: User Requests Analysis

User: "run skill analysis" Your action:

Run batch_analyze.py --dry-run
Read the proposals from output
Apply high-confidence changes (≥90%) via edit tool
Run generate_report.py to refresh dashboard
Message user: "Found N improvement(s) — applied X automatically, Y need your review"

Example 5: Weekly Cron Fires (Sunday 20:00)

Isolated agent runs full cycle:

batch_analyze.py scans all 20 installed skills
Finds github-trending-summary: 1 [new_trigger] flag, coverage 66%
Generates proposal with 65% confidence → written to proposals
Updates dashboard
You receive: "🌿 Weekly analysis done. github-trending-summary needs description update (65% confidence — needs more data to auto-apply). Review?"

Comments

Loading comments...