Arxiv Search Collector
v0.1.1Model-guided arXiv paper collection workflow that plans queries, fetches metadata, filters relevance, and merges deduplicated results by language.
⭐ 0· 1.4k·14 current·16 all-time
by@xukp20
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
The skill name/description (model-guided arXiv collection) matches the included scripts and SKILL.md: initializing a run directory, composing queries, fetching results from the arXiv API, letting a model select keep indexes, and merging/deduping results. Required resources (none) are proportionate to the stated purpose.
Instruction Scope
Runtime instructions direct the agent to run the included Python scripts and write/read files under a user-specified run directory; that is expected. The scripts perform HTTP calls only to the official arXiv API (export.arxiv.org). One behavioral note: labels and keep-IDs passed to the scripts are used directly to form filenames and directories without robust sanitization, so if you run these scripts with untrusted inputs they could create or remove files outside the intended query_results/ or per-paper directories (path traversal-ish behavior). This is an implementation-level safety concern but not evidence of hidden exfiltration or unrelated behavior.
Install Mechanism
No install spec; this is instruction-and-script only. That minimizes supply-chain risk. The bundled Python scripts rely on the standard library and perform local file I/O and urllib calls; nothing is downloaded or executed from external/untrusted URLs at install time.
Credentials
The skill requests no environment variables or external credentials. The scripts use only values passed on the command line and data from the run directory; this is proportionate to querying the public arXiv API.
Persistence & Privilege
always is false and the skill does not request persistent platform privileges. The scripts create, update, and may delete files under the user-provided run directory (normal for this tool). There is no modification of other skills' configs or system-wide agent settings in the bundle.
Assessment
This skill appears to do what it claims: batch and single-query fetches from arXiv, plus merging/deduping. Before running it, pick an explicit dedicated output root (do not point --output-root at a system or sensitive directory). Treat plan.json and per-query labels as trusted inputs — avoid running the scripts on untrusted plans since labels and keep-IDs are used verbatim when creating/removing files and directories (the code lacks strong filename sanitization). If you expect to run untrusted plans, run the collector inside a constrained environment (container or sandbox) or inspect/normalize labels first. Finally, the only network calls are to export.arxiv.org; if you need assurance, review the run_dir contents after a fetch and before running merges.Like a lobster shell, security has layers — review code before you run it.
latestvk97fhncneh6xbzk46xjb8rjenh814kxy
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
