Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

文献检索与下载全流程

v1.0.3

端到端学术文献检索与下载全流程自动化。当用户请求检索文献、下载论文、查找学术资料、搜索论文,或提到"帮我找XX相关的文献"、"下载这篇论文"、"需要某篇文献"时触发本技能。完整流程:检索 → 推荐 → 多渠道下载 → 科研通常控监控 → 通知 → 进度追踪。

1· 95·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for oreosofat/literature-research-pipeline.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "文献检索与下载全流程" (oreosofat/literature-research-pipeline) from ClawHub.
Skill page: https://clawhub.ai/oreosofat/literature-research-pipeline
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: LIT_DOWNLOAD_DIR, LIT_PROGRESS_FILE
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install literature-research-pipeline

ClawHub CLI

Package manager switcher

npx clawhub@latest install literature-research-pipeline
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The skill claims end-to-end literature search and download, and its declared env vars (download dir, progress file, optional API keys) plus filesystem and subprocess access are consistent with that. However, the addition of browser-cdp and instructions to extract cookies/CSRF tokens and post on the user's behalf (to ablesci.com) is a materially more sensitive capability than most download helpers require — it can be justified for 'post for help' functionality but is higher‑privilege than a pure downloader.
!
Instruction Scope
Runtime instructions explicitly tell the agent to locate and execute an external skill's script from the user's workspace, connect to the local browser via CDP to retrieve CSRF tokens and cookies for ablesci.com, and post help requests; then create cron jobs that repeatedly connect to the browser and automatically act on responses. These actions access browser session secrets and perform autonomous remote postings and downloads. The instructions also require reading/writing arbitrary progress files and invoking subprocesses — all of which broaden the attack surface beyond simple HTTP API calls.
Install Mechanism
This is an instruction-only skill (no install spec, no downloaded archives or third-party packages), which minimizes supply-chain/install-time risk.
!
Credentials
Declared environment variables (download dir, progress file, optional API keys/email) are reasonable. But the skill’s behavior depends on runtime access to browser cookies and CSRF tokens via CDP (not declared as an env var) — that is effectively access to session credentials. It also reads other skills from the workspace and executes their scripts via subprocess, which may introduce unexpected privileges if those scripts are untrusted.
!
Persistence & Privilege
The skill sets up recurring cron checks (every 30 minutes) that will autonomously connect to the browser and external site to monitor and download results. While 'always' is false, cron + browser-cdp + cookie access gives a recurring, autonomous capability with a nontrivial blast radius if abused.
What to consider before installing
Before installing or enabling this skill: 1) Understand it will ask the agent to control your browser via the CDP port and read CSRF tokens/cookies for ablesci.com so it can post and later act on those posts — this gives it the ability to act on your logged-in session for that site. 2) It will read and execute a script from another local skill (academic-literature-search) via subprocess — inspect that script first. 3) It will create recurring cron tasks that run every ~30 minutes and perform automated actions; if you don't want recurring autonomous operations, do not enable cron or browser-cdp. 4) If you trust the source, audit the academic-literature-search skill code and test in an isolated environment (or with a browser profile that is not logged into sensitive accounts). 5) If you are uncomfortable with browser cookie/session access, decline browser-cdp permission or require a dedicated, logged-in browser profile limited to the target site. 6) Confirm that posting to third-party services (ablesci.com) and automated downloads comply with your institution's policies and the target site's terms.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Environment variables
LIT_DOWNLOAD_DIRrequired论文下载保存目录
LIT_PROGRESS_FILErequired下载进度追踪文件路径
LIT_CDP_PORToptional浏览器远程调试端口
LIT_NOTIFY_CHANNELoptional通知渠道(如 wechat-access、telegram 等)
LIT_NOTIFY_USERoptional通知目标用户 ID
SEMANTIC_SCHOLAR_API_KEYoptionalSemantic Scholar API 密钥
LIT_UNPAYWALL_EMAILoptionalUnpaywall API 所需邮箱
ablescivk9746r4kwh4c6dshhd6eqe2j8x84t9d0academic-searchvk9746r4kwh4c6dshhd6eqe2j8x84t9d0crossrefvk9746r4kwh4c6dshhd6eqe2j8x84t9d0latestvk9746r4kwh4c6dshhd6eqe2j8x84t9d0literature-searchvk9746r4kwh4c6dshhd6eqe2j8x84t9d0paper-downloadvk9746r4kwh4c6dshhd6eqe2j8x84t9d0research-automationvk9746r4kwh4c6dshhd6eqe2j8x84t9d0sci-toolsvk9746r4kwh4c6dshhd6eqe2j8x84t9d0semantic-scholarvk9746r4kwh4c6dshhd6eqe2j8x84t9d0
95downloads
1stars
4versions
Updated 2w ago
v1.0.3
MIT-0

文献检索与下载全流程

概述

端到端学术文献检索与下载自动化。接收用户的研究主题 → 检索文献 → 推荐高价值目标 → 多渠道下载 → 科研通常控监控 → 应助后自动下载 → 通知用户。


环境变量与配置

本技能依赖以下环境变量(需在首次使用前配置):

变量名必需说明示例
LIT_DOWNLOAD_DIR论文下载保存目录~/Downloads
LIT_PROGRESS_FILE下载进度追踪文件路径memory/literature-progress.md
LIT_CDP_PORT浏览器远程调试端口(默认 9334)9334
LIT_NOTIFY_CHANNEL通知渠道(如 wechat-access、telegram 等)wechat-access
LIT_NOTIFY_USER通知目标用户 IDyour-user-id
SEMANTIC_SCHOLAR_API_KEYSemantic Scholar API 密钥your-key
LIT_UNPAYWALL_EMAILUnpaywall API 所需邮箱your-email@example.com

首次使用时,AI 应检查以上变量是否已配置。若缺失,主动询问用户并引导配置。 若用户未配置通知渠道,跳过通知步骤,仅在对话中告知结果。


流程概览

1. 文献检索  →  2. 结果展示  →  3. 用户确认  →  4a. 直接下载成功
                                                   ↓ (失败)
                                               4b. 科研通求助
                                                   ↓
                                               5. 建立Cron监控
                                                   ↓
                                               6. 应助 → 自动下载 → 通知用户
                                                   ↓
                                               7. 告知用户 + 更新进度

Step 1:文献检索

必须先读取 academic-literature-search 技能,路径通过以下方式定位:

  1. 优先查找当前 workspace 下的 skills/academic-literature-search/SKILL.md
  2. 其次查找 ~/.qclaw/skills/academic-literature-search/SKILL.md
  3. 若均不存在,提示用户先安装 academic-literature-search skill

使用其 scripts/search.py 执行检索:

import subprocess, os

# 自动定位脚本路径
workspace = os.environ.get("OPENCLAW_WORKSPACE", os.path.expanduser("~/.qclaw/workspace"))
search_script = os.path.join(workspace, "skills/academic-literature-search/scripts/search.py")

result = subprocess.run([
    "python3", search_script,
    "--query", "用户的研究主题",
    "--databases", "semantic_scholar,crossref",
    "--max_results", "20",
    "--output_format", "json"
], capture_output=True, text=True)

优先选择 Crossref 数据库(DOI 数据最权威,可靠性高)。

检索完成后立即检查每篇文献的 is_open_access 字段

  • open_access = true → 标记为可尝试 Unpaywall 直接下载
  • open_access = false → 直接规划科研通求助路线,避免在无效渠道浪费时间

Step 2:结果展示

呈现检索结果时使用以下格式(Markdown):

## 📚 文献检索结果(共 N 篇)

| # | 标题 | 作者 | 年份 | 期刊/会议 | DOI | 引用 | 开放获取 |
|---|------|------|------|-----------|-----|------|----------|
| 1 | ... | ... | 2023 | ... | 10.xxxx/xxx | 45 | ✅ |

### 🎯 高价值推荐

1. **[论文标题1]**(推荐理由)
   - DOI:`10.xxxx/xxx`
   - 亮点:...
2. **[论文标题2]**(推荐理由)
   - DOI:`10.xxxx/xxx`

推荐标准:高引用数 / 最新年份 / 开源可获取 / 直接相关用户主题


Step 3:确认用户需求

展示结果后,询问用户:「请告诉我想下载哪些论文(序号或标题),或者让我推荐?」

等待用户回复后,对每篇目标论文记录:

  • DOI、标题、发表年份
  • 是否开放获取
  • 目标下载优先级

Step 4a:多渠道直接下载

按以下优先级逐个尝试:

渠道 1:Unpaywall(最快)

GET https://api.unpaywall.org/v2/{DOI}?email={LIT_UNPAYWALL_EMAIL}
  • 响应中取 best_oa_location.landing_pagebest_oa_location.url_for_pdf
  • 注意:Unpaywall 有频率限制,每小时 ≤ 5000 请求

渠道 2:DOI.org 重定向

GET https://doi.org/{DOI}
(跟随重定向,查找 Content-Type: application/pdf 的最终 URL)
  • 若重定向至 Springer/IEEE/Elsevier → 返回 418 或需登录 → 放弃此渠道

渠道 3:Semantic Scholar PDF

GET https://api.semanticscholar.org/graph/v1/paper/{DOI}/PDF
(需设置 API Key:SEMANTIC_SCHOLAR_API_KEY)

渠道 4:Crossref PDF 链接

GET https://api.crossref.org/works/{DOI}
(从响应中取 `link` 字段)

成功标准:文件以 %PDF 开头(Magic Bytes),大小 > 50 KB 失败处理:记录失败原因(418 / 403 / 404 / 无 OA 版本),转向 Step 4b


Step 4b:科研通常控求助

前置条件

  • 用户已登录浏览器,开启了远程调试端口(默认 LIT_CDP_PORT,通常为 9334)
  • 参考命令(Mac):"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge" --remote-debugging-port=9334 "--remote-allow-origins=*"
  • 参考命令(Linux):google-chrome --remote-debugging-port=9334 --remote-allow-origins=*

步骤 4b-1:发布求助帖

通过 CDP 连接浏览器,在 https://www.ablesci.com/assist/create 发布求助:

关键参数获取(每次操作前必须从页面重新获取):

  1. CSRF Token:从页面 <meta name="csrf-token"> 提取
  2. Cookie:通过 Network.getAllCookies 获取所有 ablesci.com 域名下的 cookie
  3. 标签页:每次操作前重新 list_tabs() 并 attach,不要缓存 tab ID

发布 API

POST https://www.ablesci.com/assist/create
Content-Type: application/x-www-form-urlencoded

_csrf={token}&title={标题}&content={内容}&tag_id={分类ID}

求助帖标题建议格式【求助全文】【期刊名+年份】论文标题 求助帖内容建议:包含 DOI、论文标题、作者、发表信息

步骤 4b-2:记录求助状态

每篇论文发布后,更新下载进度追踪表(路径:LIT_PROGRESS_FILE):

## 📥 文献下载进度

| 论文 | DOI | 状态 | 来源 | 备注 |
|------|-----|------|------|------|
| 论文标题1 | 10.xxxx/xxx | ✅ 已下载 | ablesci 应助 | 保存路径 |
| 论文标题2 | 10.xxxx/xxx | ⏳ 求助中 | 科研通 | ID: xxx |

Step 5:建立 Cron 监控任务

读取 qclaw-cron-skill 获取正确的 cron 配置语法

路径:~/Library/Application Support/QClaw/openclaw/config/skills/qclaw-cron-skill/SKILL.md

监控任务配置

30 分钟检查一次科研通常控状态:

Schedule

{"kind": "every", "everyMs": 1800000}

Payload(isolated session)

{
  "kind": "agentTurn",
  "message": "检查科研通常控求助帖状态...\n\n1. 读取进度文件(LIT_PROGRESS_FILE)获取当前进度\n2. 通过 CDP 连接浏览器(http://127.0.0.1:{LIT_CDP_PORT})\n3. 逐个访问求助帖详情页(URL格式:https://www.ablesci.com/assist/detail?id={帖子ID})\n4. 检查每篇论文的状态:\n   - 求助中 → 无操作\n   - 待确认(有人上传)→ 自动下载(见下方下载流程)\n   - 已完成 → 无操作\n5. 如有新应助(状态:待确认):\n   a. 提取下载页面链接\n   b. 通过浏览器触发下载\n   c. 更新进度文件\n   d. 通知用户(若已配置通知渠道)\n6. 如全部论文已下载完成,通知用户并删除 cron 监控任务"
}

Delivery(仅在配置了通知渠道时设置):

{
  "mode": "announce",
  "channel": "{LIT_NOTIFY_CHANNEL}",
  "to": "{LIT_NOTIFY_USER}"
}

注意sessionTarget = "isolated"(必须),payload.kind = "agentTurn"

通知模板

📥 论文下载完成!

论文:{标题}
来源:{来源}
保存位置:{LIT_DOWNLOAD_DIR}/{文件名}
状态:{进度表更新}

Step 6 & 7:自动下载与进度更新

当 cron 任务检测到新应助时:

  1. 提取下载 ID:从详情页 HTML 中解析下载链接
  2. 触发下载
    • 新建标签页打开下载链接
    • 执行 Page.setDownloadBehavior(behavior=allow, downloadPath={LIT_DOWNLOAD_DIR})
    • 等待文件从 .crdownload 变为 .pdf(通常 5-30 秒)
  3. 重命名文件:去掉 (科研通-ablesci.com) 等后缀,保留年份信息
  4. 验证 PDF:文件头为 %PDF,大小 > 50 KB
  5. 更新进度表LIT_PROGRESS_FILE
  6. 通知用户(若已配置通知渠道)

关键坑点记录(来自实战经验)

CDP 操作时序问题

  • accessibility tree refbackendDOMNodeId 在跨调用后会失效
  • 解决:每次操作前重新获取 ref,不跨步缓存
  • 优先使用 DOM.querySelectorAll + DOM.resolveNode 获取 objectId,再发送 Input.dispatchMouseEvent

科研通常控下载特殊机制

  • API file/request-download-token 返回 code=0 但 不返回 URL
  • 实际下载通过浏览器 XHR 流式传输,触发后等待浏览器自动下载
  • file_server=2 对应普通线路,file_server=3 对应高速线路

常见下载失败原因

  • IEEE/Elsevier 等商业出版社:返回 418(IP/地区限制)或 403(需登录)
  • 无开放获取版本:Unpaywall 查不到 → 直接转向科研通

依赖技能

技能用途安装方式
academic-literature-searchCrossref/Semantic Scholar 文献检索skillhub install academic-literature-search
browser-cdpCDP 浏览器自动化内置或 skillhub install browser-cdp
qclaw-cron-skill定时任务管理内置

Comments

Loading comments...