Content Autoresearch
Implement Karpathy-style autoresearch loops for content optimization. Autonomous cycle: post content → wait for metrics → evaluate vs baseline → keep/discard...
Install
openclaw skills install content-autoresearchLatest Release
Compatibility
{}Capabilities
{}Verification
{}Tags
{
"latest": "1.0.0"
}name: autoresearch description: > Implement Karpathy-style autoresearch loops for content optimization. Autonomous cycle: post content → wait for metrics → evaluate vs baseline → keep/discard strategy → evolve agent playbook → repeat. Use when an agent wants to systematically improve content performance through experiments with concrete metrics, binary keep/discard decisions, versioned strategy playbooks, and auto-revert on failure. Triggers: "autoresearch", "experiment loop", "optimize content strategy", "A/B test content", "track content performance", "evolve my posting strategy", "content experiment".
Autoresearch
Adapt Karpathy's autoresearch pattern (modify → train → check metric → keep/discard → repeat)
for content creation agents. Instead of modifying train.py and checking loss, you modify your
content strategy and check engagement metrics.
The key insight: treat content strategy like a neural net. You have a "champion" (current best strategy). You propose one mutation. You evaluate it against a concrete metric. You keep it or revert. No hand-waving, no "try harder" — binary decisions backed by data.
Core Loop
1. POST content using current champion strategy + one experimental mutation
2. WAIT for metrics (default: 48h evaluation window)
3. EVALUATE performance vs champion baseline
4. VERDICT: KEEP (new champion) / MODIFY (adjust mutation) / KILL (revert)
5. EVOLVE: Update SOUL.md champion playbook based on verdict
6. REPEAT from step 1
Cardinal rule: ONE experiment at a time. Never stack mutations. If you change posting time AND hook style simultaneously, you learn nothing.
Setup
1. Define Your Loss Function
Pick ONE primary metric. This is your loss function. Everything else is secondary.
| Content Type | Good Primary Metrics | Bad Primary Metrics |
|---|---|---|
| Video clips | Views per 48h | "Vibes", likes (too noisy) |
| Tweets/posts | Impressions per 24h | Follower count (too slow) |
| Blog posts | Reads per 7d | Comments (too sparse) |
| Newsletter | Open rate per send | Subscriber count (lagging) |
Store this in your SOUL.md within the champion block:
<!-- AUTO:CHAMPION START v1 -->
## Content Strategy (Champion v1)
**Primary Metric:** views_48h
**Baseline:** 1200 avg views per clip (last 10 posts)
**Strategy:**
- Hook: Question-based opening
- Length: 60-90 seconds
- Post time: 10am ET weekdays
- Topics: AI tutorials, code walkthroughs
<!-- AUTO:CHAMPION END -->
2. Initialize Experiment Tracking
Create experiments/ directory in your workspace. Each experiment gets a file.
See references/experiment-template.md for the template.
workspace/
experiments/
active.md ← current running experiment (only ONE)
archive/
EXP-001.md ← completed experiments
EXP-002.md
SOUL.md ← contains AUTO:CHAMPION block
MEMORY.md ← contains AUTO:MEMORY block with learnings
3. Set Up Cron
Schedule a cron to check experiment status. Example for 48h evaluation window:
openclaw cron add --agent <agent> \
--schedule "0 10 * * *" \
--task "Read experiments/active.md. If evaluation_date has passed, run autoresearch evaluation. Use scripts/autoresearch_analyze.py for analysis and scripts/autoresearch_evolve.py for evolution." \
--label "autoresearch-eval"
Experiment Tracking
Every experiment follows this lifecycle:
PROPOSED → ACTIVE → EVALUATING → VERDICT (KEEP/MODIFY/KILL) → ARCHIVED
Creating an Experiment
Before posting content with a new strategy mutation, create experiments/active.md:
# EXP-003: Test storytelling hooks vs question hooks
**Status:** ACTIVE
**Variable:** hook_style
**Mutation:** storytelling (was: question-based)
**Champion Version:** v2
**Created:** 2025-01-15
**Evaluation Date:** 2025-01-17
**Posts:** []
**Hypothesis:** Storytelling hooks create more emotional investment → higher completion rate → more views
After posting, append the post ID:
**Posts:** [post_abc123, post_def456]
Single-Experiment Rule
experiments/active.md must contain exactly ONE experiment. If it exists and status is
ACTIVE, do NOT start a new experiment. Wait for evaluation.
Check before proposing:
- Does
experiments/active.mdexist? - Is its status ACTIVE or EVALUATING?
- If yes → do not create new experiment. Post using current champion strategy.
Metrics Collection
Metrics collection is agent-specific. The skill provides the pattern, not the implementation.
Generic Pattern
def collect_metrics(post_ids: list, metric_name: str) -> dict:
"""
Returns: {
"post_id": {"metric": value, "collected_at": timestamp},
...
}
"""
# Agent implements this using their platform:
# - Postiz API
# - Twitter/X API
# - YouTube Analytics
# - Manual input from user
pass
Integration Points
| Platform | How to Get Metrics |
|---|---|
| Postiz | postiz analytics --post-id <id> or API |
| X/Twitter | Browser scrape via snapshot, or API |
| YouTube | YouTube Data API or Studio scrape |
| Manual | Ask user to paste metrics, store in experiment file |
Baseline Calculation
Maintain a rolling baseline in your champion block. Update after each KEEP verdict:
Baseline = mean(last 10 champion-strategy posts)
Never include experimental posts in baseline unless they became the new champion.
Evaluation & Verdict
When evaluation_date arrives, run evaluation. Use scripts/autoresearch_analyze.py as
reference or implement inline.
Verdict Logic
improvement = (experiment_avg - baseline) / baseline
if improvement >= threshold: # default: +10%
verdict = "KEEP"
elif improvement >= -threshold: # within noise band
verdict = "MODIFY"
else: # significant regression
verdict = "KILL"
Default thresholds:
- KEEP: ≥ +10% improvement over baseline
- MODIFY: Between -10% and +10% (inconclusive)
- KILL: ≤ -10% regression
Verdict Actions
KEEP:
- Mutation becomes part of champion strategy
- Increment champion version (v2 → v3)
- Update
AUTO:CHAMPIONblock in SOUL.md - Update baseline with new data
- Archive experiment as KEEP
- Log learning to
AUTO:MEMORYblock
MODIFY:
- Results inconclusive — adjust the mutation or extend evaluation
- Can extend evaluation window ONCE (another 48h)
- If still MODIFY after extension → treat as KILL
- Do NOT stack a new mutation on top
KILL:
- Revert to champion strategy (no changes to SOUL.md)
- Archive experiment as KILL
- Log learning (what didn't work) to
AUTO:MEMORYblock - Increment kill_streak counter
- If kill_streak >= 3 → pause experiments, request human review
Revert Mechanism
If kill_streak reaches threshold (default: 3 consecutive KILLs):
- Stop proposing new experiments
- Post using pure champion strategy
- Log to memory: "Experiment pause: 3 consecutive failures"
- Wait for human to review and reset, OR auto-resume after cooldown (default: 7 days)
Evolution Protocol
When a verdict is KEEP, update your SOUL.md. See references/evolution-protocol.md for
full rules. Summary:
Updating the Champion Block
Use marker-based editing to update ONLY the champion section:
<!-- AUTO:CHAMPION START v3 -->
## Content Strategy (Champion v3)
**Primary Metric:** views_48h
**Baseline:** 1450 avg views per clip (last 10 posts)
**Strategy:**
- Hook: Storytelling opening ← CHANGED from v2
- Length: 60-90 seconds
- Post time: 10am ET weekdays
- Topics: AI tutorials, code walkthroughs
**Changelog:**
- v3: Storytelling hooks (+18% views, EXP-003)
- v2: 10am posting time (+12% views, EXP-002)
- v1: Initial strategy
<!-- AUTO:CHAMPION END -->
Updating Memory
Use AUTO:MEMORY markers in MEMORY.md for autoresearch learnings:
<!-- AUTO:MEMORY START -->
## Autoresearch Learnings
- Storytelling hooks outperform questions by ~18% (EXP-003, 2025-01-17)
- 10am ET is optimal post time for weekday content (EXP-002, 2025-01-10)
- Thumbnail text > no text for clips (EXP-001, 2025-01-03)
### What Doesn't Work
- Clickbait hooks: -22% views, higher bounce (EXP-004, 2025-01-24)
- Posts > 2min: -15% completion rate (EXP-005, 2025-01-31)
<!-- AUTO:MEMORY END -->
Keep this section lean. Max 20 entries. Archive older learnings to experiments/archive/.
Hook Variants (A/B Testing)
For content where you can test multiple variants:
- Generate N variants (2-3 max) of the ONE variable being tested
- Pick one to post (random or by agent judgment)
- Track which variant was used in experiment file
- After evaluation, the variant choice itself becomes data
Example for testing hook styles:
**Variants Generated:**
1. "Ever wondered why transformers work?" (question)
2. "I spent 3 days debugging attention masks..." (storytelling)
3. "The #1 mistake in transformer training" (listicle)
**Selected:** Variant 2 (storytelling)
**Rationale:** Testing storytelling vs champion (question-based)
Do NOT post all variants simultaneously. That's multivariate testing, not autoresearch. Post ONE variant per experiment cycle.
Examples by Content Type
Video Clips (e.g., Ada posting AI explainers)
Champion: 60s clips, question hooks, 10am ET
Experiment: Test storytelling hooks
→ Post 3 clips over 1 week with storytelling hooks
→ Wait 48h after last clip
→ Compare avg views vs baseline
→ KEEP if +10%, KILL if -10%
Tweets (e.g., Echo posting AI commentary)
Champion: Thread format, data-driven takes, noon ET
Experiment: Test single-tweet hot takes vs threads
→ Post 5 single tweets over 3 days
→ Wait 24h after last tweet
→ Compare avg impressions vs thread baseline
→ KEEP/MODIFY/KILL
Blog Posts (e.g., long-form content)
Champion: 1500 words, tutorial format, Monday publish
Experiment: Test shorter posts (800 words)
→ Post 2 short articles over 2 weeks
→ Wait 7 days after last post
→ Compare avg reads vs baseline
→ Longer evaluation window because blog traffic is slower
Integration with OpenClaw
SOUL.md
Add the AUTO:CHAMPION block to your agent's SOUL.md. The agent reads this every session
and uses it as the current content strategy. Use scripts/autoresearch_evolve.py to
programmatically update this block.
MEMORY.md
Add the AUTO:MEMORY block to your agent's MEMORY.md. Learnings accumulate here.
Keep it under 20 entries; archive to experiment files.
Cron Jobs
Set up evaluation cron as shown in Setup step 3. Recommended cadence:
- Daily check at 10am for whether evaluation window has passed
- Weekly memory cleanup (archive old experiment data)
HyperClaw Tasks
Optionally create HyperClaw tasks for experiments:
hyperclaw_add_task(
title="EXP-003: Test storytelling hooks",
description="...",
agent="ada",
priority="medium"
)
Update task status as experiment progresses. Provides visibility in dashboard.
Anti-Patterns
❌ Stacking Mutations
Testing hook style AND post time simultaneously. You learn nothing about either.
❌ No Baseline
Running experiments without establishing a baseline first. Need ≥10 posts with champion strategy before starting experiments.
❌ Vibes-Based Verdicts
"This post felt like it did well" is not a metric. Use numbers. Always.
❌ Never Reverting
Keeping a mutation because you "like it" despite data showing regression. The loss function decides, not your feelings.
❌ Too Many Variants
Generating 10 hook variants and A/B testing all of them. Keep it to 2-3 max, post ONE.
❌ Moving the Goalposts
Switching your primary metric mid-experiment because the numbers look bad on the original. Pick a metric and stick with it.
❌ Infinite MODIFY
Extending evaluation windows repeatedly. One extension max, then it's a KILL.
❌ Memory Bloat
Keeping every experiment detail in MEMORY.md. Only learnings go in memory. Raw data goes in experiment archive files.
❌ Experimenting During Low-Data Periods
Don't start experiments during holidays, platform outages, or other anomalous periods. Your baseline becomes meaningless.
Reference Files
references/experiment-template.md— Copy-paste template for new experimentsreferences/memory-structure.md— Recommended memory format and archival rulesreferences/evolution-protocol.md— Detailed rules for updating SOUL.md champion blocks
Scripts
scripts/autoresearch_analyze.py— Analyze experiment results vs baseline. Usage:python3 scripts/autoresearch_analyze.py experiments/active.md --baseline 1200 --threshold 0.10scripts/autoresearch_evolve.py— Update SOUL.md champion block after KEEP verdict. Usage:python3 scripts/autoresearch_evolve.py SOUL.md --version 3 --change "storytelling hooks" --experiment EXP-003 --metric-delta "+18%"
