Arxiv Summarizer Orchestrator

v0.1.1

Orchestrates end-to-end arXiv paper retrieval, processing, and batch reporting with language control and parallel or serial paper handling modes.

0· 855·1 current·1 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for xukp20/arxiv-summarizer-orchestrator.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Arxiv Summarizer Orchestrator" (xukp20/arxiv-summarizer-orchestrator) from ClawHub.
Skill page: https://clawhub.ai/xukp20/arxiv-summarizer-orchestrator
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install arxiv-summarizer-orchestrator

ClawHub CLI

Package manager switcher

npx clawhub@latest install arxiv-summarizer-orchestrator
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill's name and description match the runtime instructions: it orchestrates three sub-skills (collector, per-paper processor, batch reporter). It requests no env vars, binaries, or installs, which is proportionate for an instruction-only orchestrator. The dependency on the three named sub-skills is expected and coherent.
Instruction Scope
SKILL.md stays within the orchestration scope: it describes how to run scripts in the sub-skills, when to skip papers, how to batch/parallelize, and how to assemble reports. The only runtime reading it asks for is project/run-directory files (per-paper metadata, downloaded source/pdf, summary.md, and runtime throttle/state in the run directory). It does not instruct the agent to read system-wide config, secrets, or unrelated files, nor to post data to unexpected external endpoints.
Install Mechanism
No install spec or code is included (instruction-only), so nothing is written to disk or fetched by this skill itself. That is the lowest-risk install model and is appropriate for an orchestrator.
Credentials
The skill declares no environment variables, credentials, or config paths. This is proportionate. One caveat: the orchestration assumes the three sub-skills exist and those sub-skills (not this orchestrator) may require network access or API keys; those should be inspected separately.
Persistence & Privilege
always is false and the skill does not request persistent presence or elevated platform privileges. It does not modify other skills' configs. Autonomous invocation remains enabled (platform default) but that is expected for skills and is not combined here with other red flags.
Assessment
This orchestrator itself appears coherent and low-risk, but it delegates all network access and downloads to three sub-skills. Before installing or scheduling this skill you should: (1) inspect the source and install metadata for arxiv-search-collector, arxiv-paper-processor, and arxiv-batch-reporter to confirm they come from trusted authors and do not exfiltrate data or call unexpected endpoints; (2) run the workflow in an isolated workspace (dedicated run_dir) with limited filesystem permissions and monitor network activity while testing; (3) verify any scheduling/cron settings, rate-limit configuration, and that language parameters are passed explicitly; (4) confirm no secrets or unrelated system files are needed by the sub-skills. If you can review the three sub-skills and are comfortable with their behavior, this orchestrator is safe to use.

Like a lobster shell, security has layers — review code before you run it.

latestvk97856n6r4nhmv2rh8nqhajwcn814vkb
855downloads
0stars
2versions
Updated 1h ago
v0.1.1
MIT-0

ArXiv Summarizer Orchestrator

Run the full pipeline by composing three sub-skills.

Sub-skill Order

  1. arxiv-search-collector
  2. arxiv-paper-processor
  3. arxiv-batch-reporter

Workflow Parameters

  • language: manual language parameter used by all stages. Default is English when omitted.
  • paper_processing_mode: subagent_parallel or serial.
  • max_parallel_papers: default 5 when paper_processing_mode=subagent_parallel.

Workflow

Stage A: Collection Setup + Query Retrieval

  1. Initialize one run with arxiv-search-collector/scripts/init_collection_run.py.
  2. Model generates multiple focused queries from original topic and writes a minimal query_plan.json (label + query only).
  3. Run arxiv-search-collector/scripts/fetch_queries_batch.py with the plan file (recommended).
  4. (Optional fallback) call arxiv-search-collector/scripts/fetch_query_metadata.py manually for one-by-one fetch.
  5. Model reads each indexed query list and decides keep indexes.
  6. Merge selected items with arxiv-search-collector/scripts/merge_selected_papers.py.
  7. If relevance/coverage is still not good, iterate Stage A:
    • generate another query plan with new labels,
    • fetch again,
    • re-merge with --incremental and updated selection-json.
    • set weak labels to empty keep list ([]) to explicitly drop them.

Pass --language <LANG> to collector scripts so all generated markdown files in Stage A follow the selected language. Use serial query fetch in Stage A with conservative controls (for example --min-interval-sec 5, --retry-max 4). Default collector settings already include retries/backoff and run-local throttle state (<run_dir>/.runtime/arxiv_api_state.json), so manual tuning is usually unnecessary. Prefer cache reuse (no --force) unless query parameters changed or data refresh is required.

Output: one run directory with per-paper metadata subdirectories.

Stage B: Per-paper Artifact Download + Manual Summary

For each paper directory, invoke sub-skill arxiv-paper-processor once and let that skill produce <paper_dir>/summary.md.

Recommended pre-step for many papers:

  1. Run one batch artifact download before per-paper reading:
python3 arxiv-paper-processor/scripts/download_papers_batch.py \
  --run-dir /path/to/run \
  --artifact source_then_pdf \
  --max-workers 3 \
  --min-interval-sec 5 \
  --language <LANG>

Per-paper execution steps (inside arxiv-paper-processor):

  1. If <paper_dir>/summary.md already exists and is complete, skip this paper.
  2. If usable source (source/source_extract/*.tex) or PDF (source/paper.pdf) already exists, skip download.
  3. If artifacts are missing, download source with arxiv-paper-processor/scripts/download_arxiv_source.py.
  4. If source is unusable, download PDF with arxiv-paper-processor/scripts/download_arxiv_pdf.py.
  5. Model reads content and manually writes <paper_dir>/summary.md by reference format, in language.

Parallel strategy for many papers:

  • Default: paper_processing_mode=subagent_parallel with max_parallel_papers=5.
  • Optional: paper_processing_mode=serial to process one paper at a time.
  • In parallel mode, run multiple arxiv-paper-processor instances in batches; concurrent papers must not exceed max_parallel_papers.
  • Wait for one batch to finish before starting the next batch.
  • In serial mode, run exactly one arxiv-paper-processor instance at a time.
  • Subagent workers should only own one paper directory each to avoid file conflicts.
  • Do not use scripts to auto-compose summary text; scripts are download-only tools.

Output: all paper directories contain summary.md.

Stage C: Bundle + Final Hierarchical Report

  1. Run arxiv-batch-reporter/scripts/collect_summaries_bundle.py --language <LANG>.
  2. Model reads summaries_bundle.md and writes collection_report_template.md in base dir.
  3. In template, each paper leaf entry must include one standalone placeholder line: {{ARXIV_BRIEF:<arxiv_id>}}.
  4. Run arxiv-batch-reporter/scripts/render_collection_report.py to generate final collection_report.md.
  5. Do not manually paraphrase per-paper conclusion lines in final report; they must come from per-paper summary.md section 10 via script injection.

If language is non-English (for example Chinese), all intermediate markdown files and final reports should follow that language.

Periodic Scheduling

This orchestrator is suitable for cron/scheduled execution in OpenClaw:

  • Frequency examples: daily, weekly, monthly.
  • For rolling windows, use lookback (1d, 7d, 30d) when initializing runs.

Output Layout

<output-root>/<topic>-<timestamp>-<range>/

  • task_meta.json, task_meta.md
  • query_results/, query_selection/
  • <arxiv_id>/metadata.md + downloaded source/pdf + summary.md
  • summaries_bundle.md
  • collection_report_template.md
  • final rendered collection report (e.g. collection_report.md)

Use references/workflow-checklist.md as execution checklist.

Related Skills

This is the top-level orchestration skill.

Before using it, install and enable these three sub-skills:

  • arxiv-search-collector
  • arxiv-paper-processor
  • arxiv-batch-reporter

Execution order inside this orchestrator:

  1. arxiv-search-collector (Stage A)
  2. arxiv-paper-processor (Stage B)
  3. arxiv-batch-reporter (Stage C)

Comments

Loading comments...