SkillSkillcommunity

Content Autoresearch

Implement Karpathy-style autoresearch loops for content optimization. Autonomous cycle: post content → wait for metrics → evaluate vs baseline → keep/discard...

content-autoresearch
Install
openclaw skills install content-autoresearch
Latest Release
Version 1.0.0
Compatibility
{}
Capabilities
{}
Verification
{}
Tags
{
  "latest": "1.0.0"
}

name: autoresearch description: > Implement Karpathy-style autoresearch loops for content optimization. Autonomous cycle: post content → wait for metrics → evaluate vs baseline → keep/discard strategy → evolve agent playbook → repeat. Use when an agent wants to systematically improve content performance through experiments with concrete metrics, binary keep/discard decisions, versioned strategy playbooks, and auto-revert on failure. Triggers: "autoresearch", "experiment loop", "optimize content strategy", "A/B test content", "track content performance", "evolve my posting strategy", "content experiment".

Autoresearch

Adapt Karpathy's autoresearch pattern (modify → train → check metric → keep/discard → repeat) for content creation agents. Instead of modifying train.py and checking loss, you modify your content strategy and check engagement metrics.

The key insight: treat content strategy like a neural net. You have a "champion" (current best strategy). You propose one mutation. You evaluate it against a concrete metric. You keep it or revert. No hand-waving, no "try harder" — binary decisions backed by data.

Core Loop

1. POST content using current champion strategy + one experimental mutation
2. WAIT for metrics (default: 48h evaluation window)
3. EVALUATE performance vs champion baseline
4. VERDICT: KEEP (new champion) / MODIFY (adjust mutation) / KILL (revert)
5. EVOLVE: Update SOUL.md champion playbook based on verdict
6. REPEAT from step 1

Cardinal rule: ONE experiment at a time. Never stack mutations. If you change posting time AND hook style simultaneously, you learn nothing.

Setup

1. Define Your Loss Function

Pick ONE primary metric. This is your loss function. Everything else is secondary.

Content TypeGood Primary MetricsBad Primary Metrics
Video clipsViews per 48h"Vibes", likes (too noisy)
Tweets/postsImpressions per 24hFollower count (too slow)
Blog postsReads per 7dComments (too sparse)
NewsletterOpen rate per sendSubscriber count (lagging)

Store this in your SOUL.md within the champion block:

<!-- AUTO:CHAMPION START v1 -->
## Content Strategy (Champion v1)
**Primary Metric:** views_48h
**Baseline:** 1200 avg views per clip (last 10 posts)
**Strategy:**
- Hook: Question-based opening
- Length: 60-90 seconds
- Post time: 10am ET weekdays
- Topics: AI tutorials, code walkthroughs
<!-- AUTO:CHAMPION END -->

2. Initialize Experiment Tracking

Create experiments/ directory in your workspace. Each experiment gets a file. See references/experiment-template.md for the template.

workspace/
  experiments/
    active.md          ← current running experiment (only ONE)
    archive/
      EXP-001.md       ← completed experiments
      EXP-002.md
  SOUL.md              ← contains AUTO:CHAMPION block
  MEMORY.md            ← contains AUTO:MEMORY block with learnings

3. Set Up Cron

Schedule a cron to check experiment status. Example for 48h evaluation window:

openclaw cron add --agent <agent> \
  --schedule "0 10 * * *" \
  --task "Read experiments/active.md. If evaluation_date has passed, run autoresearch evaluation. Use scripts/autoresearch_analyze.py for analysis and scripts/autoresearch_evolve.py for evolution." \
  --label "autoresearch-eval"

Experiment Tracking

Every experiment follows this lifecycle:

PROPOSED → ACTIVE → EVALUATING → VERDICT (KEEP/MODIFY/KILL) → ARCHIVED

Creating an Experiment

Before posting content with a new strategy mutation, create experiments/active.md:

# EXP-003: Test storytelling hooks vs question hooks

**Status:** ACTIVE
**Variable:** hook_style
**Mutation:** storytelling (was: question-based)
**Champion Version:** v2
**Created:** 2025-01-15
**Evaluation Date:** 2025-01-17
**Posts:** []
**Hypothesis:** Storytelling hooks create more emotional investment → higher completion rate → more views

After posting, append the post ID:

**Posts:** [post_abc123, post_def456]

Single-Experiment Rule

experiments/active.md must contain exactly ONE experiment. If it exists and status is ACTIVE, do NOT start a new experiment. Wait for evaluation.

Check before proposing:

  1. Does experiments/active.md exist?
  2. Is its status ACTIVE or EVALUATING?
  3. If yes → do not create new experiment. Post using current champion strategy.

Metrics Collection

Metrics collection is agent-specific. The skill provides the pattern, not the implementation.

Generic Pattern

def collect_metrics(post_ids: list, metric_name: str) -> dict:
    """
    Returns: {
        "post_id": {"metric": value, "collected_at": timestamp},
        ...
    }
    """
    # Agent implements this using their platform:
    # - Postiz API
    # - Twitter/X API
    # - YouTube Analytics
    # - Manual input from user
    pass

Integration Points

PlatformHow to Get Metrics
Postizpostiz analytics --post-id <id> or API
X/TwitterBrowser scrape via snapshot, or API
YouTubeYouTube Data API or Studio scrape
ManualAsk user to paste metrics, store in experiment file

Baseline Calculation

Maintain a rolling baseline in your champion block. Update after each KEEP verdict:

Baseline = mean(last 10 champion-strategy posts)

Never include experimental posts in baseline unless they became the new champion.

Evaluation & Verdict

When evaluation_date arrives, run evaluation. Use scripts/autoresearch_analyze.py as reference or implement inline.

Verdict Logic

improvement = (experiment_avg - baseline) / baseline

if improvement >= threshold:       # default: +10%
    verdict = "KEEP"
elif improvement >= -threshold:    # within noise band
    verdict = "MODIFY"
else:                              # significant regression
    verdict = "KILL"

Default thresholds:

  • KEEP: ≥ +10% improvement over baseline
  • MODIFY: Between -10% and +10% (inconclusive)
  • KILL: ≤ -10% regression

Verdict Actions

KEEP:

  1. Mutation becomes part of champion strategy
  2. Increment champion version (v2 → v3)
  3. Update AUTO:CHAMPION block in SOUL.md
  4. Update baseline with new data
  5. Archive experiment as KEEP
  6. Log learning to AUTO:MEMORY block

MODIFY:

  1. Results inconclusive — adjust the mutation or extend evaluation
  2. Can extend evaluation window ONCE (another 48h)
  3. If still MODIFY after extension → treat as KILL
  4. Do NOT stack a new mutation on top

KILL:

  1. Revert to champion strategy (no changes to SOUL.md)
  2. Archive experiment as KILL
  3. Log learning (what didn't work) to AUTO:MEMORY block
  4. Increment kill_streak counter
  5. If kill_streak >= 3 → pause experiments, request human review

Revert Mechanism

If kill_streak reaches threshold (default: 3 consecutive KILLs):

  1. Stop proposing new experiments
  2. Post using pure champion strategy
  3. Log to memory: "Experiment pause: 3 consecutive failures"
  4. Wait for human to review and reset, OR auto-resume after cooldown (default: 7 days)

Evolution Protocol

When a verdict is KEEP, update your SOUL.md. See references/evolution-protocol.md for full rules. Summary:

Updating the Champion Block

Use marker-based editing to update ONLY the champion section:

<!-- AUTO:CHAMPION START v3 -->
## Content Strategy (Champion v3)
**Primary Metric:** views_48h
**Baseline:** 1450 avg views per clip (last 10 posts)
**Strategy:**
- Hook: Storytelling opening ← CHANGED from v2
- Length: 60-90 seconds
- Post time: 10am ET weekdays
- Topics: AI tutorials, code walkthroughs
**Changelog:**
- v3: Storytelling hooks (+18% views, EXP-003)
- v2: 10am posting time (+12% views, EXP-002)
- v1: Initial strategy
<!-- AUTO:CHAMPION END -->

Updating Memory

Use AUTO:MEMORY markers in MEMORY.md for autoresearch learnings:

<!-- AUTO:MEMORY START -->
## Autoresearch Learnings
- Storytelling hooks outperform questions by ~18% (EXP-003, 2025-01-17)
- 10am ET is optimal post time for weekday content (EXP-002, 2025-01-10)
- Thumbnail text > no text for clips (EXP-001, 2025-01-03)
### What Doesn't Work
- Clickbait hooks: -22% views, higher bounce (EXP-004, 2025-01-24)
- Posts > 2min: -15% completion rate (EXP-005, 2025-01-31)
<!-- AUTO:MEMORY END -->

Keep this section lean. Max 20 entries. Archive older learnings to experiments/archive/.

Hook Variants (A/B Testing)

For content where you can test multiple variants:

  1. Generate N variants (2-3 max) of the ONE variable being tested
  2. Pick one to post (random or by agent judgment)
  3. Track which variant was used in experiment file
  4. After evaluation, the variant choice itself becomes data

Example for testing hook styles:

**Variants Generated:**
1. "Ever wondered why transformers work?" (question)
2. "I spent 3 days debugging attention masks..." (storytelling)
3. "The #1 mistake in transformer training" (listicle)
**Selected:** Variant 2 (storytelling)
**Rationale:** Testing storytelling vs champion (question-based)

Do NOT post all variants simultaneously. That's multivariate testing, not autoresearch. Post ONE variant per experiment cycle.

Examples by Content Type

Video Clips (e.g., Ada posting AI explainers)

Champion: 60s clips, question hooks, 10am ET
Experiment: Test storytelling hooks
→ Post 3 clips over 1 week with storytelling hooks
→ Wait 48h after last clip
→ Compare avg views vs baseline
→ KEEP if +10%, KILL if -10%

Tweets (e.g., Echo posting AI commentary)

Champion: Thread format, data-driven takes, noon ET
Experiment: Test single-tweet hot takes vs threads
→ Post 5 single tweets over 3 days
→ Wait 24h after last tweet
→ Compare avg impressions vs thread baseline
→ KEEP/MODIFY/KILL

Blog Posts (e.g., long-form content)

Champion: 1500 words, tutorial format, Monday publish
Experiment: Test shorter posts (800 words)
→ Post 2 short articles over 2 weeks
→ Wait 7 days after last post
→ Compare avg reads vs baseline
→ Longer evaluation window because blog traffic is slower

Integration with OpenClaw

SOUL.md

Add the AUTO:CHAMPION block to your agent's SOUL.md. The agent reads this every session and uses it as the current content strategy. Use scripts/autoresearch_evolve.py to programmatically update this block.

MEMORY.md

Add the AUTO:MEMORY block to your agent's MEMORY.md. Learnings accumulate here. Keep it under 20 entries; archive to experiment files.

Cron Jobs

Set up evaluation cron as shown in Setup step 3. Recommended cadence:

  • Daily check at 10am for whether evaluation window has passed
  • Weekly memory cleanup (archive old experiment data)

HyperClaw Tasks

Optionally create HyperClaw tasks for experiments:

hyperclaw_add_task(
    title="EXP-003: Test storytelling hooks",
    description="...",
    agent="ada",
    priority="medium"
)

Update task status as experiment progresses. Provides visibility in dashboard.

Anti-Patterns

❌ Stacking Mutations

Testing hook style AND post time simultaneously. You learn nothing about either.

❌ No Baseline

Running experiments without establishing a baseline first. Need ≥10 posts with champion strategy before starting experiments.

❌ Vibes-Based Verdicts

"This post felt like it did well" is not a metric. Use numbers. Always.

❌ Never Reverting

Keeping a mutation because you "like it" despite data showing regression. The loss function decides, not your feelings.

❌ Too Many Variants

Generating 10 hook variants and A/B testing all of them. Keep it to 2-3 max, post ONE.

❌ Moving the Goalposts

Switching your primary metric mid-experiment because the numbers look bad on the original. Pick a metric and stick with it.

❌ Infinite MODIFY

Extending evaluation windows repeatedly. One extension max, then it's a KILL.

❌ Memory Bloat

Keeping every experiment detail in MEMORY.md. Only learnings go in memory. Raw data goes in experiment archive files.

❌ Experimenting During Low-Data Periods

Don't start experiments during holidays, platform outages, or other anomalous periods. Your baseline becomes meaningless.

Reference Files

  • references/experiment-template.md — Copy-paste template for new experiments
  • references/memory-structure.md — Recommended memory format and archival rules
  • references/evolution-protocol.md — Detailed rules for updating SOUL.md champion blocks

Scripts

  • scripts/autoresearch_analyze.py — Analyze experiment results vs baseline. Usage: python3 scripts/autoresearch_analyze.py experiments/active.md --baseline 1200 --threshold 0.10
  • scripts/autoresearch_evolve.py — Update SOUL.md champion block after KEEP verdict. Usage: python3 scripts/autoresearch_evolve.py SOUL.md --version 3 --change "storytelling hooks" --experiment EXP-003 --metric-delta "+18%"