autoresearchclaw-autonomous-research

Fully autonomous research pipeline that turns a topic idea into a complete academic paper with real citations, experiments, and conference-ready LaTeX.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 27 · 0 current installs · 0 all-time installs

by@adisinghstudent

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The skill claims to run a full autonomous research pipeline (LLM access, sandboxed experiments, web fetching, LaTeX generation). Those capabilities legitimately require LLM API credentials and code to run; however, the registry metadata declares no required env vars or install steps while SKILL.md instructs cloning a GitHub repo and exporting OPENAI_API_KEY / OPENROUTER_API_KEY. That inconsistency between declared requirements and the runtime instructions is concerning.

Instruction Scope

SKILL.md tells the user/agent to git clone a repository, create a Python venv, pip install the project, set API keys, and then run fully autonomous pipelines (including --auto-approve). It also documents options for web fetching, browser-based collection, and running sandboxed experiments that will execute generated code. Those instructions cause the agent/environment to download and execute arbitrary external code and to possibly fetch data from the web — a broad scope that goes beyond a small helper script and could be abused or misconfigured.

Install Mechanism

The skill package itself contains no install spec, but the embedded instructions direct cloning and pip installing from an external GitHub repo (https://github.com/aiming-lab/AutoResearchClaw.git). Installing and running code from an external/unverified repo is higher risk because it writes and executes code on disk; the registry should have declared this or provided vetted install metadata.

Credentials

SKILL.md requires LLM API keys (OPENAI_API_KEY, OPENROUTER_API_KEY) and references provider-specific config, but the registry lists no required env vars or primary credential. Asking for LLM API keys is proportionate to the stated purpose, but the omission from declared requirements (and the presence of other optional bridges like OpenClaw memory/cron) is an incoherence that reduces transparency and raises the risk of accidental key exposure or misuse.

ℹ

Persistence & Privilege

always is false and model invocation is allowed (normal). The skill mentions optional features that imply persistence (use_memory, use_cron, use_sessions_spawn, openclaw_bridge) which could store cross-session knowledge or schedule runs; this is not inherently forbidden but combined with the ability to install/execute code and with the auto-approve mode increases the operational blast radius. No explicit request to modify other skills or system-wide configs is present.

What to consider before installing

This skill instructs you to clone and pip-install an external GitHub project and to export LLM API keys, but the registry metadata does not declare those requirements — treat that as a red flag. Before installing or running: 1) Inspect the GitHub repo source code yourself (or ask for a verified upstream release) to ensure it does what it claims and contains no exfiltration/backdoor code. 2) Do not run with --auto-approve in a production machine; test in an isolated VM or container with no sensitive credentials. 3) Use least-privilege API keys (rate-limited, scoped) and avoid exposing high-value credentials. 4) If you must run experiments, run them in a network-restricted sandbox and inspect generated experiment code before execution. 5) Prefer a published release with checksums or an official package rather than blindly running git clone && pip install -e from an unverified repo. If you cannot verify the upstream code, avoid installing this skill.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk97819k6j3mkra8107q7gqsyss83199a

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

AutoResearchClaw — Autonomous Research Pipeline

Skill by ara.so — Daily 2026 Skills collection.

AutoResearchClaw is a fully autonomous 23-stage research pipeline that takes a natural language topic and produces a complete academic paper: real arXiv/Semantic Scholar citations, sandboxed experiments, statistical analysis, multi-agent peer review, and conference-ready LaTeX (NeurIPS/ICML/ICLR). No hallucinated references. No human babysitting.

Installation

# Clone and install
git clone https://github.com/aiming-lab/AutoResearchClaw.git
cd AutoResearchClaw
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

# Verify CLI is available
researchclaw --help

Requirements: Python 3.11+

Configuration

cp config.researchclaw.example.yaml config.arc.yaml

Minimum config (`config.arc.yaml`)

project:
  name: "my-research"

research:
  topic: "Your research topic here"

llm:
  provider: "openai"
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"
  primary_model: "gpt-4o"
  fallback_models: ["gpt-4o-mini"]

experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python"

export OPENAI_API_KEY="$YOUR_OPENAI_KEY"

OpenRouter config (200+ models)

llm:
  provider: "openrouter"
  api_key_env: "OPENROUTER_API_KEY"
  primary_model: "anthropic/claude-3.5-sonnet"
  fallback_models:
    - "google/gemini-pro-1.5"
    - "meta-llama/llama-3.1-70b-instruct"

export OPENROUTER_API_KEY="$YOUR_OPENROUTER_KEY"

ACP (Agent Client Protocol) — no API key needed

llm:
  provider: "acp"
  acp:
    agent: "claude"   # or: codex, gemini, opencode, kimi
    cwd: "."

The agent CLI (e.g. claude) handles its own authentication.

OpenClaw bridge (optional advanced capabilities)

openclaw_bridge:
  use_cron: true              # Scheduled research runs
  use_message: true           # Progress notifications
  use_memory: true            # Cross-session knowledge persistence
  use_sessions_spawn: true    # Parallel sub-sessions
  use_web_fetch: true         # Live web search in literature review
  use_browser: false          # Browser-based paper collection

Key CLI Commands

# Basic run — fully autonomous, no prompts
researchclaw run --topic "Your research idea" --auto-approve

# Run with explicit config file
researchclaw run --config config.arc.yaml --topic "Mixture-of-experts routing efficiency" --auto-approve

# Run with topic defined in config (omit --topic flag)
researchclaw run --config config.arc.yaml --auto-approve

# Interactive mode — pauses at gate stages for approval
researchclaw run --config config.arc.yaml --topic "Your topic"

# Check pipeline status / resume a run
researchclaw status --run-id rc-20260315-120000-abc123

# List past runs
researchclaw list

Gate stages (5, 9, 20) pause for human approval in interactive mode. Pass --auto-approve to skip all gates.

Python API

from researchclaw.pipeline import Runner
from researchclaw.config import load_config

# Load config and run
config = load_config("config.arc.yaml")
config.research.topic = "Efficient attention mechanisms for long-context LLMs"
config.auto_approve = True

runner = Runner(config)
result = runner.run()

# Access outputs
print(result.artifact_dir)          # artifacts/rc-YYYYMMDD-HHMMSS-<hash>/
print(result.deliverables_dir)      # .../deliverables/
print(result.paper_draft_path)      # .../deliverables/paper_draft.md
print(result.latex_path)            # .../deliverables/paper.tex
print(result.bibtex_path)           # .../deliverables/references.bib
print(result.verification_report)  # .../deliverables/verification_report.json

# Run specific stages only
from researchclaw.pipeline import Runner, StageRange

runner = Runner(config)
result = runner.run(stages=StageRange(start="LITERATURE_COLLECT", end="KNOWLEDGE_EXTRACT"))

# Access knowledge base after a run
from researchclaw.knowledge import KnowledgeBase

kb = KnowledgeBase.load(result.artifact_dir)
findings = kb.get("findings")
literature = kb.get("literature")
decisions = kb.get("decisions")

Output Structure

After a run, all outputs land in artifacts/rc-YYYYMMDD-HHMMSS-<hash>/:

artifacts/rc-20260315-120000-abc123/
├── deliverables/
│   ├── paper_draft.md          # Full academic paper (Markdown)
│   ├── paper.tex               # Conference-ready LaTeX
│   ├── references.bib          # Real BibTeX — auto-pruned to inline citations
│   ├── verification_report.json # 4-layer citation integrity report
│   └── reviews.md              # Multi-agent peer review
├── experiment_runs/
│   ├── run_001/
│   │   ├── code/               # Generated experiment code
│   │   ├── results.json        # Structured metrics
│   │   └── sandbox_output.txt  # Execution logs
├── charts/
│   └── *.png                   # Auto-generated comparison charts
├── evolution/
│   └── lessons.json            # Self-learning lessons for future runs
└── knowledge_base/
    ├── decisions.json
    ├── experiments.json
    ├── findings.json
    ├── literature.json
    ├── questions.json
    └── reviews.json

Pipeline Stages Reference

Phase	Stage #	Name	Notes
A	1	TOPIC_INIT	Parse and scope research topic
A	2	PROBLEM_DECOMPOSE	Break into sub-problems
B	3	SEARCH_STRATEGY	Build search queries
B	4	LITERATURE_COLLECT	Real API calls to arXiv + Semantic Scholar
B	5	LITERATURE_SCREEN	Gate — approve/reject literature
B	6	KNOWLEDGE_EXTRACT	Extract structured knowledge
C	7	SYNTHESIS	Synthesize findings
C	8	HYPOTHESIS_GEN	Multi-agent debate to form hypotheses
D	9	EXPERIMENT_DESIGN	Gate — approve/reject design
D	10	CODE_GENERATION	Generate experiment code
D	11	RESOURCE_PLANNING	GPU/MPS/CPU auto-detection
E	12	EXPERIMENT_RUN	Sandboxed execution
E	13	ITERATIVE_REFINE	Self-healing on failure
F	14	RESULT_ANALYSIS	Multi-agent analysis
F	15	RESEARCH_DECISION	PROCEED / REFINE / PIVOT
G	16	PAPER_OUTLINE	Structure paper
G	17	PAPER_DRAFT	Write full paper
G	18	PEER_REVIEW	Evidence-consistency check
G	19	PAPER_REVISION	Incorporate review feedback
H	20	QUALITY_GATE	Gate — final approval
H	21	KNOWLEDGE_ARCHIVE	Save lessons to KB
H	22	EXPORT_PUBLISH	Emit LaTeX + BibTeX
H	23	CITATION_VERIFY	4-layer anti-hallucination check

Common Patterns

Pattern: Quick paper on a topic

export OPENAI_API_KEY="$OPENAI_API_KEY"
researchclaw run \
  --topic "Self-supervised learning for protein structure prediction" \
  --auto-approve

Pattern: Reproducible run with full config

# config.arc.yaml
project:
  name: "protein-ssl-research"

research:
  topic: "Self-supervised learning for protein structure prediction"

llm:
  provider: "openai"
  api_key_env: "OPENAI_API_KEY"
  primary_model: "gpt-4o"
  fallback_models: ["gpt-4o-mini"]

experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python"
  max_iterations: 3
  timeout_seconds: 300

researchclaw run --config config.arc.yaml --auto-approve

Pattern: Use Claude via OpenRouter for best reasoning

export OPENROUTER_API_KEY="$OPENROUTER_API_KEY"

cat > config.arc.yaml << 'EOF'
project:
  name: "my-research"
llm:
  provider: "openrouter"
  api_key_env: "OPENROUTER_API_KEY"
  primary_model: "anthropic/claude-3.5-sonnet"
  fallback_models: ["google/gemini-pro-1.5"]
experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python"
EOF

researchclaw run --config config.arc.yaml \
  --topic "Efficient KV cache compression for transformer inference" \
  --auto-approve

Pattern: Resume after a failed run

# List runs to find the run ID
researchclaw list

# Resume from last completed stage
researchclaw run --resume rc-20260315-120000-abc123

Pattern: Programmatic batch research

import asyncio
from researchclaw.pipeline import Runner
from researchclaw.config import load_config

topics = [
    "LoRA fine-tuning on limited hardware",
    "Speculative decoding for LLM inference",
    "Flash attention variants comparison",
]

config = load_config("config.arc.yaml")
config.auto_approve = True

for topic in topics:
    config.research.topic = topic
    runner = Runner(config)
    result = runner.run()
    print(f"[{topic}] → {result.deliverables_dir}")

Pattern: OpenClaw one-liner (if using OpenClaw agent)

Share the repo URL with OpenClaw, then say:
"Research mixture-of-experts routing efficiency"

OpenClaw auto-reads RESEARCHCLAW_AGENTS.md, clones, installs, configures, and runs the full pipeline.

Compile the LaTeX Output

# Navigate to deliverables
cd artifacts/rc-*/deliverables/

# Compile (requires a LaTeX distribution)
pdflatex paper.tex
bibtex paper
pdflatex paper.tex
pdflatex paper.tex

# Or upload paper.tex + references.bib directly to Overleaf

Troubleshooting

`researchclaw: command not found`

# Make sure the venv is active and package is installed
source .venv/bin/activate
pip install -e .
which researchclaw

API key errors

# Verify env var is set
echo $OPENAI_API_KEY
# Should print your key (not empty)

# Set it explicitly for the session
export OPENAI_API_KEY="sk-..."

Experiment sandbox failures

The pipeline self-heals at Stage 13 (ITERATIVE_REFINE). If it keeps failing:

# Increase timeout and iterations in config
experiment:
  max_iterations: 5
  timeout_seconds: 600
  sandbox:
    python_path: ".venv/bin/python"

Citation hallucination warnings

Stage 23 (CITATION_VERIFY) runs a 4-layer check. If references are pruned:

This is expected behaviour — fake citations are removed automatically
Check verification_report.json for details on which citations were rejected and why

PIVOT loop running indefinitely

Stage 15 (RESEARCH_DECISION) may pivot multiple times. To cap iterations:

research:
  max_pivots: 2
  max_refines: 3

LaTeX compilation errors

# Check for missing packages
pdflatex paper.tex 2>&1 | grep "File.*not found"

# Install missing packages (TeX Live)
tlmgr install <package-name>

Out of memory during experiments

# Force CPU mode in config
experiment:
  sandbox:
    device: "cpu"
    max_memory_gb: 4

Key Concepts

PIVOT/REFINE Loop: Stage 15 autonomously decides PROCEED, REFINE (tweak params), or PIVOT (new hypothesis direction). All artifacts are versioned.
Multi-Agent Debate: Stages 8, 14, 18 use structured multi-perspective debate — not a single LLM pass.
Self-Learning: Each run extracts lessons with 30-day time decay. Future runs on similar topics benefit from past mistakes.
Sentinel Watchdog: Background monitor detects NaN/Inf in results, checks paper-evidence consistency, scores citation relevance, and guards against fabrication throughout the run.
4-Layer Citation Verification: arXiv lookup → CrossRef lookup → DataCite lookup → LLM relevance scoring. A citation must pass all layers to survive.

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…