Install
openclaw skills install arxiv-searchRetrieve paper metadata from arXiv using keyword queries and save results as JSONL (`papers/papers_raw.jsonl`). **Trigger**: arXiv, arxiv, paper search, meta...
openclaw skills install arxiv-searchCollect an initial paper set with enough metadata to support downstream ranking, taxonomy building, and citation generation.
When online, prefer rich arXiv metadata (categories, arxiv_id, pdf_url, published/updated, etc.). When offline, accept an export and convert it cleanly.
Always read:
references/domain_pack_overview.md — how domain packs drive topic-specific behaviorDomain packs (loaded by topic match):
assets/domain_packs/llm_agents.json — pinned IDs, query rewrite rules for LLM agent topicsUse scripts/run.py only for:
id_list backfillDo not treat run.py as the place for:
assets/domain_packs/)queries.md (keywords, excludes, time window)papers/papers_raw.jsonl (JSONL; 1 paper per line)
title, authors, year, url, abstractarxiv_id, pdf_url, categories, primary_category, published, updated, doi, journal_ref, commentpapers/papers_raw.csvid_list using --enrich-metadata or queries.md enrich_metadata: true.queries.md and expand into concrete query strings.title, authors (array), year, url, abstractmax_results if specified.papers/papers_raw.jsonl exists.title, authors, year, url.papers/papers_raw.jsonl; append notes to STATUS.md.output/ before writing is approved.python scripts/run.py --helppython scripts/run.py --workspace <workspace_dir> --query "<query>" --max-results 200python scripts/run.py --workspace <workspace_dir> --input <export.csv|json|jsonl>--query <q>: repeatable; multiple queries are unioned--exclude <term>: repeatable; excludes applied after retrieval--max-results <n>: cap total retrieved--input <export.*>: offline mode (CSV/JSON/JSONL)--enrich-metadata: best-effort enrich via arXiv id_list (needs network)queries.md also supports: keywords, exclude, time window, max_results, enrich_metadatapython scripts/run.py --workspace <ws> --query "LLM agent" --query "tool use" --exclude "survey" --max-results 300id_list fetch):
python scripts/run.py --workspace <ws> --query 2509.02547 --max-results 1papers/import.csv (or .json/.jsonl) under the workspace, then run: python scripts/run.py --workspace <ws>queries.md):
- time window: { from: 2022, to: 2025 } then run offline import normallypapers/papers_raw.jsonl is emptySymptom:
Causes:
queries.md is empty.Solutions:
papers/import.csv|json|jsonl in the workspace or pass --input.queries.md.--query to sanity-check the parser.Symptom:
authors/year/abstract/url.Causes:
Solutions:
title, authors, year, url, abstract.--enrich-metadata to backfill missing fields (best effort).queries.md has non-empty keywords (or pass --query).papers/import.* and rerun.