Live Evo: Online Evolution with verified experiences

v0.1.0

Self-evolving memory system that learns from verifiable tasks. Use when completing tasks where you can verify the outcome (coding, predictions, analysis). Au...

1· 293·1 current·1 all-time
byYaolunZhang@mercury7353
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill name/description (self-evolving memory for verifiable tasks) matches what the code and SKILL.md do: add, list, retrieve, generate guidelines from, and reweight past experiences. There are no unrelated requirements (no credentials, no unrelated binaries).
Instruction Scope
SKILL.md instructs the agent to run the included scripts and explicitly tells how to call them; those scripts only read/write the experience DB and local weight history. They do not read other system configuration or environment variables. Note: SKILL.md and the scripts insist on persistent storage at ~/.live-evo, so any task text, user feedback, or answers you pass to these scripts will be stored on disk.
Install Mechanism
There is no install spec — this is instruction-only plus bundled Python scripts. No packages are downloaded or executed from external URLs. Risk from installation is low; executing the provided Python scripts runs local code included in the skill package.
Credentials
The skill requests no environment variables or external credentials (proportional). However, it persistently stores user-provided inputs (questions, failure reasons, improvements) under ~/.live-evo/experience_db.jsonl and writes weight history to ~/.live-evo/weight_history.jsonl. This is expected for a memory system but may capture sensitive data if you pass secrets or private content into the scripts.
Persistence & Privilege
always is false and model invocation is allowed (normal). The skill writes its own data under the user's home directory (~/.live-evo) which is a reasonable level of persistence for a memory skill, but it is persistent across agent runs and not encrypted or access-controlled by the skill. The skill does not modify other skills or system-wide settings.
Assessment
This skill appears to do what it claims: it stores and retrieves 'experiences' locally to help produce task-specific guidelines and adjust weights based on verification. Before installing/using: (1) review the included Python files (they are small and local) — no network/exfiltration code is present; (2) avoid passing secrets or private data (API keys, passwords, private messages, proprietary code) into the add/update/retrieve commands because those strings are stored in plaintext under ~/.live-evo; (3) note that a referenced bundled seed path (experiences/experience_db.jsonl) is looked up on first run — if present it will be copied into ~/.live-evo, otherwise nothing is copied; (4) if you want less persistence, run the scripts with a separate working directory or periodically delete/rotate ~/.live-evo; (5) if you plan to share outputs produced using retrieved experiences, review them first to ensure they don't leak stored sensitive inputs.

Like a lobster shell, security has layers — review code before you run it.

latestvk978r8gn1n9w7drdrn7h2ap28s82aqmq
293downloads
1stars
1versions
Updated 1mo ago
v0.1.0
MIT-0

Live-Evo: Online Self-Evolving Memory

You are using the Live-Evo memory system that learns from past mistakes through experience accumulation and adaptive evaluation.

IMPORTANT — Script location: All scripts are in the scripts/ subdirectory next to this SKILL.md file. When running scripts, use the absolute path to the scripts/ directory relative to where this file is located. For example, if this SKILL.md is at /path/to/live-evo/SKILL.md, the scripts are at /path/to/live-evo/scripts/.

Experience data is stored persistently at ~/.live-evo/experience_db.jsonl (independent of skill installation location).

Core Workflow

1. Retrieve & Compile (Before Acting)

Run the experience retrieval script to find relevant past experiences:

python <scripts-dir>/retrieve.py --query "YOUR_TASK_DESCRIPTION"

If experiences are found, they will be compiled into a task-specific guideline. Use this guideline to inform your approach.

2. Decide: Verify or Direct Apply

You must judge whether contrastive verification (two attempts) is worthwhile based on:

FactorDo Contrastive EvalSkip, Direct Apply
Cost of re-runningLow (e.g. run a test)High (e.g. long build, API costs, heavy computation)
VerifiabilityClear ground truth exists (tests, known answer)No easy way to verify programmatically
Task complexitySimple enough to attempt twiceToo complex/large to reasonably duplicate
Guideline relevanceRetrieved guideline is highly relevantGuideline is loosely related or no guideline found

If contrastive eval IS worthwhile → Go to Step 2A If contrastive eval is NOT worthwhile → Go to Step 2B

Step 2A: Contrastive Evaluation (Two Attempts)

Make two independent attempts:

Attempt A (Without Memory):

  • Solve the task using only your base knowledge
  • Record your answer/approach

Attempt B (With Guideline):

  • Apply the retrieved guideline
  • Solve the task with this informed approach
  • Record your answer/approach

Then verify and update weights:

python <scripts-dir>/update.py \
  --task "TASK_DESCRIPTION" \
  --result-a "RESULT_WITHOUT_MEMORY" \
  --result-b "RESULT_WITH_GUIDELINE" \
  --correct "CORRECT_ANSWER" \
  --experience-ids "id1,id2,..."

Step 2B: Direct Apply with Feedback-Based Learning

When contrastive evaluation is not feasible:

  1. Apply the guideline directly (if one was retrieved) and complete the task
  2. Observe feedback from any of these sources:
    • User feedback (corrections, complaints, approval)
    • Environment signals (test results, error messages, build output)
    • Outcome observation (did the result work as expected?)
  3. Store experience directly if feedback reveals a lesson:
python <scripts-dir>/add_experience.py \
  --question "THE_TASK_QUESTION" \
  --failure-reason "What went wrong (from feedback)" \
  --improvement "Key lesson learned" \
  --category "coding|analysis|prediction|debugging|other"

No contrastive comparison needed — just learn from what happened.

3. Add New Experience (On Any Failure)

Whenever a task fails or feedback reveals a learnable lesson — regardless of which path you took — store the experience:

python <scripts-dir>/add_experience.py \
  --question "THE_TASK_QUESTION" \
  --failure-reason "What went wrong" \
  --improvement "Key lesson learned" \
  --category "coding|analysis|prediction|debugging|other"

4. Update Weights (When Possible)

If you used a retrieved guideline and can determine whether it helped:

python <scripts-dir>/update.py \
  --task "TASK_DESCRIPTION" \
  --result-a "WHAT_WOULD_HAVE_HAPPENED" \
  --result-b "WHAT_ACTUALLY_HAPPENED" \
  --correct "CORRECT_OUTCOME" \
  --experience-ids "id1,id2,..."

If you cannot determine whether the guideline helped, skip weight updates — no update is better than a wrong update.

When to Use Live-Evo

Use this system for:

  • Coding tasks: Bug fixes, implementations where tests can verify
  • Analysis tasks: Where ground truth can be checked
  • Predictions: Forecasting with eventual verification
  • Problem solving: Tasks with objectively correct answers
  • Any task with user feedback: Even without formal verification, user corrections are valuable signals

Experience Format

Each experience contains:

  • question: The original task/question
  • failure_reason: What went wrong in the original attempt
  • improvement: Key lesson or approach that would have helped
  • missed_information: Information sources or considerations that were missed
  • weight: Quality score (0.1-2.0) updated based on usefulness
  • category: Domain category for filtering

Key Principles

  1. Cost-Aware Verification: Only do contrastive evaluation when the cost is justified — don't waste tokens/time on expensive double-runs
  2. Feedback is Gold: User corrections, test failures, and error messages are direct learning signals — always store these
  3. Selective Acquisition: Only store experiences that contain a genuine, actionable lesson
  4. Weight-based Retrieval: Good experiences rise, bad ones fade
  5. Task-Specific Guidelines: Don't apply raw experiences — synthesize them into actionable guidance
  6. When in Doubt, Store: It's better to store a potentially useful experience than to miss a lesson; low-quality experiences will naturally decay via weight updates

Manual Commands

View all experiences:

python <scripts-dir>/list_experiences.py

Search experiences:

python <scripts-dir>/retrieve.py --query "your search query" --top-k 5

Get statistics:

python <scripts-dir>/stats.py

Comments

Loading comments...