Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

paper-summary-scripted

v1.0.0

download arxiv paper pdfs or accept local paper files with a preprocessing script, then extract text, clean text, and generate a summary version, detailed ve...

1· 16·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the code and SKILL.md. The only network access is deterministic arXiv PDF download logic; local file reads and text extraction are expected for a preprocessing summarization tool.
Instruction Scope
Runtime instructions only call the included Python script, then read manifest/extracted text and run generation stages. The script legitimately reads local files you supply and writes outputs under the provided output-dir. This is expected, but be aware it will access any local file paths you pass via --paperfiles.
Install Mechanism
No install spec; the skill is instruction-only with a bundled script. The script uses standard stdlib modules and optional third-party pypdf for PDF extraction; pypdf is required at runtime for PDFs but is not installed by the skill automatically (no installer present). No downloads from untrusted hosts — PDF fetches target arxiv.org only.
Credentials
No environment variables, credentials, or config paths are requested. The script performs only arXiv downloads and local file reads provided by the user; no unrelated secrets are required or accessed.
Persistence & Privilege
always is false and the skill does not attempt to modify other skills or system-wide settings. It writes outputs to the user-specified output directory only.
Assessment
This skill appears to do what it claims: it will download PDFs from arXiv (only) and read any local file paths you pass, extract text (requires pypdf for PDFs), write downloaded PDFs and extracted text into the output directory (./runs/paper-summary in the example), and then the agent will generate the three summarization stages plus a quality check. Before using: (1) ensure you only pass local file paths you trust (the script will open them); (2) be aware there is no builtin limit on downloaded file sizes or storage usage — provide an appropriate output directory and monitor disk space; (3) install pypdf if you need PDF extraction; (4) the prompts ask for citation-like support evidence (page numbers) but extracted plain text may not include page markers — expect that the contribution extraction step may need careful review. No credentials or external endpoints beyond arxiv.org are used.

Like a lobster shell, security has layers — review code before you run it.

latestvk97cw4366td8x4ka1n3n2jmrks856kp2
16downloads
1stars
1versions
Updated 7h ago
v1.0.0
MIT-0

带脚本的论文摘要生成

Overview

Use this skill when arXiv paper URLs or local paper files need deterministic preprocessing before the four-stage paper summarization workflow runs. The bundled script downloads arXiv PDFs to local storage when URLs are provided, then handles extraction and cleaning. Do not parse paper web pages or use HTML content as the paper source. After preprocessing, run three independent generation stages from the same cleaned paper text, then a fourth verification stage that evaluates all three generated outputs against the original text.

Canonical inputs

Normalize the request into:

  • language
  • paperurls for arXiv inputs
  • paperfiles

Treat empty string, [], null, None, missing field, or blank list as empty.

Workflow

  1. If both paperurls and paperfiles are empty, return an error immediately.
  2. Run the preprocessing script:
    • python scripts/process_papers.py --language "<language>" --paperurls '<paperurls>' --paperfiles '<paperfiles>' --output-dir ./runs/paper-summary
  3. Read manifest.json in the output directory.
  4. For each successful item, read the extracted_text_path file and treat its contents as cleaned_text.
  5. Generate these three sections separately from the same cleaned_text:
    • summary version
    • detailed version
    • contribution extraction
  6. After the three sections are complete, run quality judgment using:
    • original cleaned paper text
    • summary version
    • detailed version
    • contribution extraction
  7. Merge the outputs using references/output-template.md.

Preprocessing rules

The script does deterministic preprocessing only. Treat URL inputs as arXiv identifiers, arXiv abstract URLs, or arXiv PDF URLs that must resolve to a PDF download. Do not attempt webpage parsing, HTML extraction, or generic site scraping. Do not use the script's previews as a substitute for the full extracted text. Treat manifest failures, partial extraction notes, or unsupported formats as evidence that the source may be incomplete.

Generation-stage rules

Consult references/prompts.md for the exact Dify-style prompt patterns and variable mapping.

Summary version

Generate in the requested language. Must include when available:

  • original title
  • research background or pain point
  • core method name
  • at least one key experimental number

If no explicit experimental result is provided in the source, state 原文未提供具体实验数据 or the equivalent in the requested language. Do not add praise or filler.

Detailed version

Generate in the requested language. Use this exact structure:

  • ### 1. 背景与动机
  • ### 2. 核心方法
  • ### 3. 实验设置
  • ### 4. 主要结果与消融实验
  • ### 5. 局限性(若有)

Only include content supported by the extracted text.

Contribution extraction

Generate in the requested language. Each contribution must be an independent innovation point, not an experimental observation. Each one must include source-grounded support evidence without inventing citations or page numbers.

Quality judgment

Run this only after the three generated sections exist. Evaluate summary, detailed, and contribution outputs separately against the original cleaned text. For each one, provide a 1-5 score and a concrete error list.

Manifest-aware confidence rules

Downgrade confidence or mention extraction risk when the manifest shows:

  • download failure
  • arxiv source normalization failure
  • partial parsing
  • fallback decoding
  • missing quantitative evidence
  • unreadable pdf or docx parsing problems

Non-negotiable constraints

  • Never fabricate paper content missing from the extracted text.
  • Keep the three generation stages independent before the quality stage.
  • Preserve the requested language.
  • Keep different papers separate unless the user explicitly asks for a comparison.

Resources

  • scripts/process_papers.py: normalize arXiv inputs, download PDFs or read local files, extract text, clean text, and emit manifest.json
  • references/prompts.md: exact Dify-style prompt logic and variable mapping
  • references/output-template.md: final response template
  • references/script-usage.md: script I/O and manifest field definitions

Comments

Loading comments...