Document Workflow
v1.3.1一键实现学术论文的搜索、下载、分块提取文本及结构化总结,支持按年份和引用数筛选。
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
Name/description align with the code: searching, downloading PDFs, and extracting LaTeX from arXiv. However, the search script expects an external 'mcporter' binary (used to call Tavily) but the skill's manifest/requirements do not declare this dependency. That mismatch (undeclared binary dependency that spawns a local executable) is incoherent with the 'no required binaries' metadata.
Instruction Scope
SKILL.md instructs running the included Python scripts to search, download, and parse LaTeX — which is consistent. The scripts perform network I/O (HTTP GETs to arxiv.org and Semantic Scholar/openAccessPdf URLs) and write files to disk (default path is a Windows Desktop folder). SKILL.md does not clearly warn about executing included code, reliance on local binaries, or the security implications of downloading arbitrary PDF/source URLs.
Install Mechanism
There is no install spec (instruction-only), but the skill bundles runnable Python scripts that the agent will execute. Because execution happens without an explicit install step, the agent may run these scripts directly — this is lower friction but means the provided code will be executed on the host. Additionally, the scripts call external binaries (mcporter) via subprocess when using Tavily; that external dependency is not declared.
Credentials
The skill declares no required env vars, and SKILL.md mentions an optional SEMANTIC_SCHOLAR_API_KEY. The code uses os.environ.get for that key but also supplies a hard-coded fallback API key embedded in the script. Hard-coded keys are a concern (they may be stale, abused, or belong to someone else). Otherwise, the skill does not request broad credentials or unrelated environment access.
Persistence & Privilege
The skill is not marked 'always', and it doesn't request persistent elevated privileges or modify other skills. It writes files to a user-specified download directory (defaulting to a Windows Desktop path), which is normal for a downloader but worth noting as file-system write activity.
What to consider before installing
This skill appears to do what it says (search arXiv/Semantic Scholar, download PDFs, fetch LaTeX source, and parse .tex files), but there are a few red flags you should consider before installing or running it:
- Undeclared dependency: search_papers.py will try to call a local 'mcporter' binary to use Tavily. The skill metadata lists no required binaries. If you don't have mcporter, the script falls back, but if mcporter is present the skill will execute it. Verify what 'mcporter' is and only allow it if you trust that binary.
- Hard-coded API key: a Semantic Scholar API key is embedded as a fallback in the script. Hard-coded keys can be abused or invalid; set your own SEMANTIC_SCHOLAR_API_KEY in the environment instead of relying on the embedded value.
- File writes and downloads: the scripts will download remote archives and PDFs and extract/write files to disk (default path is C:\Users\Lenovo\Desktop\papers). Run in a sandbox or adjust the download directory to a safe location and ensure you trust the sources of downloaded URLs.
- Executing bundled code: although there is no installer, the agent will run the included Python modules. Audit the scripts (you already have them) and consider running them manually in a controlled environment before granting the skill autonomous invocation.
Recommended actions:
- If you want to use this skill, set SEMANTIC_SCHOLAR_API_KEY yourself and remove or replace the hard-coded fallback from a trusted copy of the code.
- Inspect or block use of 'mcporter' unless you know its origin and trust it. Consider editing the skill to remove the mcporter/Tavily path if you won't use it.
- Run the scripts in a sandboxed environment (container or VM) or with limited filesystem/network permissions to observe behavior before enabling in production.
- If you are not comfortable auditing code, avoid installing the skill or ask the maintainer for clarity on mcporter and the embedded API key.Like a lobster shell, security has layers — review code before you run it.
arxivlatestpaperpdfresearch
Document Workflow
Academic paper research: Search → Download LaTeX → Read & Summarize
Quick Start
1. Search Papers
python -m skills.document-workflow.scripts.search_papers --query "world model" --max_results 5 --year_from 2024
2. Download LaTeX Source
python -m skills.document-workflow.scripts.latex_reader "2301.07088" --keep
3. Read & Summarize
Read the LaTeX source files and summarize following the reading guide below.
Reading Guide
After downloading LaTeX source to arxiv_{id}/, read the .tex files in this order:
Step 1: Get Metadata
Read the main .tex file (usually main.tex, root.tex, or {paper-id}.tex) for:
\title{}- Paper title\author{}- Authors\begin{abstract}...\end{abstract}- Abstract
Step 2: Understand the Problem
Read the Introduction section (usually intro.tex, 1-introduction.tex, or first \section):
- What problem does this paper solve?
- What are the key contributions?
- How does it relate to prior work?
Step 3: Understand the Method
Read the Method/Approach section:
- What is the proposed approach?
- Key equations in
\begin{equation}...\end{equation}or\begin{align}...\end{align} - Algorithm pseudocode in
\begin{algorithm}...\end{algorithm}
Step 4: Check Experiments
Read the Experiments section:
- Datasets used
- Baselines compared
- Metrics in
\begin{table}...\end{table}with results - Key findings
Step 5: Check References
Read the .bib or .bbl file for:
- Related work citations
- Key papers in the field
Output Schema
Summarize the paper in this JSON format(see more details in ./references/output_schema.json):
{
"paper_title": "Full title",
"authors": ["Author 1", "Author 2"],
"source": "arXiv:XXXX.XXXXX",
"task_definition": {
"domain": "Research domain",
"task": "Specific task",
"problem_statement": "What problem this paper solves",
"key_contributions": ["Contribution 1", "Contribution 2"]
},
"experiments": {
"datasets": ["Dataset 1", "Dataset 2"],
"baselines": ["Baseline 1", "Baseline 2"],
"metrics": [
{"name": "Metric name", "description": "What it measures","definition":"Mathematical definition or formula for the metric"}
],
"results": [
{"setting": "Dataset", "metric": "Metric", "proposed_method": "Score", "best_baseline": "Score"}
],
"key_findings": ["Finding 1", "Finding 2"]
}
}
Scripts
| Script | Function |
|---|---|
search_papers.py | Search papers (Tavily + Semantic Scholar) |
download_paper.py | Download PDF (for human reading) |
latex_reader.py | Download LaTeX source (for AI reading) |
Tips for Reading LaTeX
| LaTeX Command | Meaning |
|---|---|
\section{Title} | Section heading |
\subsection{Title} | Subsection heading |
\textbf{text} | Bold text (often important) |
\cite{key} | Citation reference |
\begin{equation}...\end{equation} | Numbered equation |
\begin{table}...\end{table} | Table |
\begin{figure}...\end{figure} | Figure |
\input{file} or \subfile{file} | Include another .tex file |
Config
# Optional: Semantic Scholar API key
export SEMANTIC_SCHOLAR_API_KEY="your-key"
# Default download path
C:\Users\Lenovo\Desktop\papers
Comments
Loading comments...
