Install
openclaw skills install arxiv-cv-dailyFetch arXiv papers for a target date in cs.CV, screen them against a user topic, save logs under a user-chosen output directory, download matched PDFs, and summarize the matched papers from full text.
openclaw skills install arxiv-cv-dailyUse this skill when the user gives a topic and a date and wants you to:
cs.CVDo not use .openclaw/skills/auto-research-ideation for this workflow. This skill must run independently.
When the user calls this skill, follow this sequence:
topic and date./tmp.--topic-spec-file plus --output-root.extracted_text/*.txt before writing any detailed paper summaries.If the user asks for multiple days, process them sequentially. Finish the first requested day, report progress, then continue to the next day. Keep the logs and outputs separated by day. Do not report the same resolved release day twice in one multi-day run; if fallback for a later requested day lands on a date that was already used earlier in the run, keep walking backward.
While you work, send short progress updates to the user. The user should know when you are:
Do not claim a paper matches the topic unless the saved title and abstract screening logs support it. If a title is inconclusive, inspect the abstract. If the paper still looks relevant enough to keep, download it and inspect the extracted full text before summarizing.
Default download rule:
Default interaction pattern:
First response after the workflow finishes: List the matched papers day by day. For each paper include:
2026-03-20
1. Paper Title
arXiv ID: 2603.12345
相关性分数: 17
Second-stage analysis: When the user asks to inspect one or more specific papers, then read the saved full text and produce a proper paper analysis.
When analyzing a specific paper in detail, cover these points:
Use this detailed-paper output shape by default:
TitleWhy It Was RetrievedProblemMethodData / BenchmarkMain ResultsWhy It Matters For The User TopicLimitations / Open QuestionsSaved FilesFor Why It Was Retrieved, explicitly cite the saved evidence:
For Method, prefer answering:
For Main Results, prefer answering:
If the paper text is incomplete or extraction quality is poor, say so explicitly instead of overclaiming.
The topic spec should be a JSON object. A good default structure is:
{
"topic": "文档理解",
"canonical_topic": "document understanding",
"positive_phrases": [
"document understanding",
"visual document understanding",
"document parsing",
"pdf parsing",
"table extraction"
],
"anchor_terms": [
"document",
"layout",
"pdf",
"table",
"form"
],
"related_topics": [
"ocr",
"table understanding"
],
"negative_phrases": [
"audio-video understanding",
"video understanding",
"3d scene understanding"
]
}
Generate this JSON from the user's topic, save it under the chosen output root, and then pass it to the script. Prefer this model-generated topic spec over hardcoded topic expansion.
All workflow outputs must be saved under the chosen output_root. Before running the workflow, explicitly confirm that output root with the user unless the user already provided it. If the user explicitly says they have no preference or asks for the default, use /tmp. This includes:
.txt filesPrefer keeping one run's artifacts under one output root so the user can find them easily.
Resolve the launcher path from the installed skill location in the current deployment environment. Do not hardcode a personal absolute path in instructions. For local shell use, a relative invocation from the repo root is fine:
python3 arxiv_cs_cv_daily.py \
--topic "document understanding or OCR" \
--topic-spec-file /tmp/topic_specs/document_understanding.json \
--date 2026-03-22 \
--days 3 \
--category cs.CV \
--timezone Asia/Shanghai \
--output-root /tmp \
--output json \
--verbose
In deployed environments such as Telegram bots, the bot/backend should resolve the launcher path from the installed skill location. Do not ask the end user for a launcher path. The only filesystem choice that may be exposed at the interaction layer is output_root, and this skill should ask for that before running unless the user already gave it.
pdftotext available on PATH for PDF text extraction/tmpexport.arxiv.org and arxiv.org--topic: required--topic-spec-file /path/to/spec.json: optional but preferred; structured topic expansion generated by the model--date YYYY-MM-DD: optional; defaults to today in the requested timezone--days N: optional; process N consecutive days starting from --date and walking backward one day at a time--category: defaults to cs.CV--timezone: defaults to UTC--fallback-days: how far to walk backward if the requested day has no papers--max-results: arXiv API page size--limit: limit the number of fetched papers after daily filtering--output-root: defaults to /tmp--output json|markdown: console output format--verbose: include full text content in JSON output (default: false for more concise output)The script now provides real-time progress updates:
[1/110] Screening titles... - Title screening progress[10/110] Screening titles... - Every 10 papers✓ Screening complete: 87/110 papers matched. - Summary after screeningDownloading 18 PDFs for full-text review... - Before PDF downloads[1/18] Downloading 2603.20187v1... - Per-PDF download progressThese progress messages appear on stderr and won't interfere with JSON output on stdout.
Each run creates a readable directory under the chosen output root, for example:
/tmp/2026-03-23_103000_cs-cv_document-understanding-or-ocr_2026-03-22
Typical files:
01_request.jsontopic_specs/*.json07_manifest.json07_summary.mdactivity.logdays/YYYY-MM-DD/02_all_papers.jsondays/YYYY-MM-DD/02_all_papers.mddays/YYYY-MM-DD/03_title_screening.jsondays/YYYY-MM-DD/04_abstract_screening.jsondays/YYYY-MM-DD/05_matched_papers.jsondays/YYYY-MM-DD/06_downloads_and_text.jsondays/YYYY-MM-DD/07_manifest.jsondays/YYYY-MM-DD/07_summary.mddays/YYYY-MM-DD/downloads/*.pdfdays/YYYY-MM-DD/extracted_text/*.txtcache/arxiv_api/<category>/<date>.jsonAfter you inspect the saved full text and write the final answer for the user, also save:
08_user_summary.mdUse concrete status updates. For example:
cs.CV 论文。"cs.CV 的全部论文,并把 title 和 abstract 保存到你指定的目录。"When the workflow finishes, tell the user:
When the user picks a paper for deeper reading, save the detailed analysis under the same run directory, preferably as:
08_user_summary.md for a combined note08_user_summary_<arxiv_id>.md for paper-specific notes--topic-spec-file is provided.<output_root>/cache/arxiv_api/... before calling the arXiv API again.