Arxiv Skill Learning

WarnAudited by ClawScan on May 10, 2026.

Overview

This skill is review-worthy because it can create persistent new skills from arXiv papers and run extractor-generated shell commands in your workspace without a clear approval or sandbox step.

Only run this in a disposable or tightly sandboxed workspace. Before installing or invoking it, require review of the sibling extractor/paper-client code, inspect every generated skill diff and smoke-test command, disable any automatic/hourly use, and ensure failed generations are quarantined or rolled back.

Findings (6)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

A generated skill could alter the agent's future behavior or add unsafe code to the workspace before the user has inspected it.

Why it was flagged

The skill is explicitly designed to generate and commit new executable agent skills, but the artifacts do not show a review gate, approval step, staging area, or rollback mechanism before changing the workspace.

Skill content
**Extract**: Uses `arxiv-skill-extractor` to generate skill code. ... **Solidify**: Commits the new skill to the workspace.
Recommendation

Require explicit user approval after showing a diff of generated files, stage new skills separately, and provide rollback/removal instructions before solidifying anything.

What this means

A generated smoke-test command could run arbitrary local commands, modify files, install packages, or access data available from the workspace.

Why it was flagged

The command executed by the shell comes from the extractor result rather than from a fixed allowlist, and it runs at the workspace root without an artifact-shown confirmation or sandbox.

Skill content
await execAsync(extractionResult.smokeTestCommand, { encoding: 'utf8', timeout: 60000, cwd: WORKSPACE_ROOT });
Recommendation

Avoid raw shell execution for generated commands; use a fixed test harness, validate/allowlist commands and arguments, and run tests in a disposable sandbox with explicit user approval.

What this means

The user cannot assess what code actually fetches papers or generates skills, even though those helpers drive high-impact workspace changes.

Why it was flagged

Core behavior depends on sibling components that are not included in the provided manifest or declared in package.json, including the component that generates skills.

Skill content
const paperClientPath = path.resolve(__dirname, '../arxiv-paper-reviews/paper_client'); ... const extractorPath = path.resolve(__dirname, '../arxiv-skill-extractor/index');
Recommendation

Package or pin the required helper components, declare them in metadata, and make their source and versions reviewable before running the learning workflow.

What this means

Broken or unsafe generated artifacts may remain in the workspace and influence later runs or future agent behavior.

Why it was flagged

A failed generated skill/test can still be treated as a successful learning outcome and recorded persistently, with no shown cleanup or containment.

Skill content
result.testStatus = 'failed'; ... // Continue to record success of learning, but note test failure ... result.status = 'success'; recordLearnedPaper(targetPaper.paper_key, extractionResult.skillName, result);
Recommendation

Fail closed: delete or quarantine generated files on test failure, do not record success unless tests pass, and require review before promoting a skill.

What this means

The skill may generate skills from papers outside the scope a user expects from the documented configuration.

Why it was flagged

SKILL.md documents target categories as cs.AI, cs.CL, cs.LG, and cs.SE, but the code broadens the search to more categories and then falls back to an unfiltered recent-paper fetch.

Skill content
const primaryCategories = ['cs.AI', 'cs.LG', 'cs.CL', 'cs.CV', 'cs.RO', 'cs.SE', 'cs.CY', 'cs.MA'] ... const papers = await paperClient.listPapers({ limit: 20 });
Recommendation

Align the documented configuration with code, enforce a user-configured category allowlist, and make any broad fallback opt-in.

What this means

The workspace will keep a record of learned paper keys, skill names, timestamps, and outcomes.

Why it was flagged

The skill maintains persistent learned-paper state that can affect future selection behavior; this appears purpose-aligned and limited, but it is retained across runs.

Skill content
const LEARNED_DB_PATH = path.join(WORKSPACE_ROOT, 'memory/evolution/learned_papers.json'); ... fs.writeFileSync(LEARNED_DB_PATH, JSON.stringify(db, null, 2));
Recommendation

Document this file, provide a safe reset/clear option, and ensure users can review or edit the retained state.