Install
openclaw skills install paid-db-accessSearch and extract papers from paid academic databases via Browser Relay with low-token evaluate scripts. Currently fully tested on IEEE Xplore only; WoS, Scopus, ACM, and CNKI extractors are provided as templates (untested — verify selectors before use). Includes a guide for adding new databases. Use when the user asks to search for academic papers on IEEE or other paid databases behind an institutional login. Supports deduplication, multi-page extraction, and arXiv matching.
openclaw skills install paid-db-accessLet AI search paywalled academic databases through your real browser — 30× fewer tokens than snapshot, 10× faster. 让 AI 通过你的真实浏览器高效访问付费学术数据库——Token 消耗降低 30 倍,速度提升 10 倍。
First run — verify end-to-end in 30 seconds / 首次运行 30 秒验证:
"search IEEE for 'large language model scientific writing'" / 对 AI 说:「在 IEEE 搜索 large language model scientific writing」Status · 状态: IEEE Xplore ✅ verified · CNKI 知网 ✅ verified · WoS/Scopus/ACM 📋 template ready, pending verification
assets/chrome-extension/)~/.openclaw/openclaw.json) / 扩展已配置 Gateway URL + Tokenbrowser.status → profile: "chrome", running: true, cdpReady: true
If offline / 若不在线: "Open browser → log into database → click Relay icon / 请打开浏览器 → 登录数据库 → 点击 Relay 图标激活"
Think before you navigate / 导航前先想好搜索策略:
| Situation / 情况 | Action / 策略 |
|---|---|
| Non-English concept (e.g. 自动生成科研论文) / 非英语概念 | Translate first → split into 2-3 complementary queries / 先翻译 → 拆成 2-3 个互补搜索词 |
| Results > 500 | Add year/type filters or tighten query / 加过滤或收紧搜索词 |
| Results < 5 | Broaden query (remove quotes, add synonyms, expand year range) / 放宽搜索词 |
| Results = 0 | Switch database or try broader keywords / 换数据库或去掉引号 |
| No good results on first try | Use simple evaluate to inspect page content, then adjust / 用简单 evaluate 探路后再调 |
Prefer Semantic Scholar + arXiv APIs for free/open papers first; use Browser Relay only for paywalled content, old papers, patents, citation reports, or Chinese papers. / 优先用 Semantic Scholar + arXiv API 搜免费论文;Browser Relay 仅用于付费内容、老论文、专利、引文报告、中文论文。
Construct search URL with query + filters directly — never simulate clicking the search box. / 用 URL 参数直接构造搜索——不要模拟点击搜索框。
navigate to: https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=<query>&ranges=<year1>_<year2>_Year
| Database · 数据库 | Search URL | Page param · 翻页参数 |
|---|---|---|
| IEEE Xplore | .../search/searchresult.jsp?queryText={q}&ranges={y1}_{y2}_Year | &pageNumber={n} |
| ACM DL | .../action/doSearch?AllField={q} | &startPage={n} |
| Scopus | .../results/results.uri?query={q} | &offset={n} |
| WoS | .../wos/woscc/basic-search (requires interaction · 需交互) | — |
| CNKI 知网 | .../kns8s/AdvSearch (requires interaction · 需交互) | — |
Read from extractors/ — never write JS manually. / 从 extractors/ 目录读取——不要手写 JS。
read extractors/ieee.js # IEEE Xplore
read extractors/acm.js # ACM Digital Library
read extractors/scopus.js # Scopus
read extractors/wos.js # Web of Science
read extractors/cnki.js # 中国知网
Use browser.act(kind="evaluate", fn=<script>). Scripts have built-in deduplication (by link), so count = exact paper count. / 脚本已内置去重(按 link),count 精确等于实际论文数。
browser.act(profile="chrome", targetId="<ID>", kind="evaluate", fn="<extractor content>")
Pagination logic · 翻页逻辑: The extractor returns totalPages, currentPage, perPage. If totalPages > 1:
Page 1 evaluate → { totalResults: "59", count: 25, totalPages: 3 }
→ navigate: URL + &pageNumber=2 → evaluate Page 2
→ navigate: URL + &pageNumber=3 → evaluate Page 3
→ Merge all papers (cross-page dedup by link)
If totalPages is "?", try &pageNumber=2 manually — if 404 or empty, there's only 1 page. / 如果 totalPages 是 ?,手动试 &pageNumber=2,若 404 或空则只有 1 页。
Match extracted papers to free arXiv versions: / 匹配提取到的论文到 arXiv 免费版:
echo '<papers JSON array>' | python scripts/arxiv_match.py --delay 1.5
| Papers · 论文数 | Strategy · 策略 |
|---|---|
| ≤ 10 | Sync match, reply together / 同步匹配,一起回复 |
| > 10 | Reply with results first, match arXiv in background, append PDF links / 先回复主结果,后台异步匹配后追加 PDF 链接 |
Consolidate papers from all pages + databases, deduplicate globally, present to user with arXiv PDF links where available. / 汇总所有页面+数据库的论文,全局去重后呈现,附带 arXiv PDF 直链。
Token efficiency · Token 效率: evaluate ≈ 500 tokens/page vs snapshot ≈ 15,000 tokens/page — 30× saving. / evaluate 约 500 tokens/页 vs snapshot 约 15,000 tokens/页——节省 30 倍。
All extractors/*.js must follow v2 spec: built-in dedup + pagination info. / 所有提取脚本必须遵循 v2 规范:内置去重 + 翻页信息。
(() => {
const seen = new Set(); // dedup by link / 按 link 去重
const results = [];
// Selector priority: try most specific first, pick the one with most matches
// 选择器策略:优先最精确的,选匹配最多的
// ... extraction logic / 提取逻辑 ...
return {
totalResults: "number or '?' / 数字或'?'",
count: results.length,
totalPages: "number or '?' / 数字或'?'",
currentPage: 1,
perPage: 25,
database: 'ieee',
papers: results
};
})()
Standard paper fields · 标准论文字段:
{
"title": "Paper title · 论文标题",
"authors": "Author1; Author2",
"year": "2024",
"venue": "Journal / Conference name · 期刊/会议名",
"type": "Journal Article | Conference Paper | ...",
"link": "Original URL · 原文链接",
"doi": "DOI (if available · 如有)",
"abstract": "Abstract snippet (if available · 如有)",
"citations": "Citation count (if available · 如有)"
}
For databases without an existing extractor, first probe with simple evaluate: / 对没有提取脚本的数据库,先用简单 evaluate 探路:
() => { return {
title: document.title,
bodyClasses: document.body.className,
mainSelectors: Array.from(document.querySelectorAll('h1,h2,h3')).map(h=>h.innerText).slice(0,10)
};}
| Error · 错误 | Cause · 原因 | Fix · 解决方案 |
|---|---|---|
browser.status → running: false | Relay not activated / 未激活 | Click Relay icon on browser tab / 点击浏览器标签页 Relay 图标 |
tabs: [] | No attached tab / 无附加标签页 | Same as above / 同上 |
navigate returns 418 | Cloudflare block / 被拦截 | Cookie expired, re-login / Cookie 过期,重新登录 |
evaluate returns count: 0 | Selector mismatch / 选择器不匹配 | Probe page first: () => ({title: document.title, text: document.body.innerText.substring(0,500)}) then adjust selectors / 先用简单 JS 探路再调选择器 |
evaluate returns undefined | JS syntax error / 语法错误 | Test script in browser Console first / 先在浏览器 Console 验证 |
| Page title contains "Sign In" | Login lost / 登录失效 | Re-login / 提示用户重新登录 |
Can't reach browser control service | Gateway down | Run openclaw gateway restart / 运行 openclaw gateway restart |
evaluate → tab not found | CDP not attached / CDP 未附加 | Click Relay icon on current tab / 在当前标签页点击 Relay 图标 |
count >> expected (e.g. 100 vs 25) | Old script without dedup / 旧版未去重 | Use v2 extractor (with new Set()) / 确认用了 v2 脚本 |
⚠️ Honest disclosure · 诚实声明: Only IEEE Xplore has been fully tested end-to-end (search → extract → paginate → arXiv match). All other extractors are templates — written based on static page structure analysis, never run against a live logged-in search. Assume they need selector adjustments before they work. / 仅 IEEE Xplore 完成了端到端实测(检索→提取→翻页→arXiv 匹配)。其他所有数据库的提取脚本均为模板——基于静态页面结构分析编写,未在真实登录检索环境中运行过。使用前预期需要调整选择器。
| Database · 数据库 | Search · 检索 | Extraction · 提取 | Pagination · 翻页 | arXiv | Notes · 备注 |
|---|---|---|---|---|---|
| IEEE Xplore | ✅ | ✅ | ✅ | ✅ | Only fully tested DB · 唯一完整测试 |
| CNKI 知网 | 📋 | 📋 | — | N/A | Template, needs campus VPN · 模板,需学校 VPN |
| Web of Science | 📋 | 📋 | — | 📋 | Template, untested · 模板未测试 |
| Scopus | 📋 | 📋 | — | 📋 | Template, untested · 模板未测试 |
| ACM DL | 📋 | 📋 | — | 📋 | Template, untested · 模板未测试 |
✅ = verified in live session · 📋 = template provided, needs verification
Want to add PubMed, JSTOR, ProQuest, or your university's custom repository? Here's the 4-step recipe. / 想添加 PubMed、JSTOR、ProQuest 或学校自建库?四步搞定。
Log into the database in your browser, do a test search, then run: / 浏览器登录数据库,做一次测试搜索,然后执行:
// Paste into browser.act(kind="evaluate", fn=...) / 粘贴到 evaluate 中执行
() => { return {
url: window.location.href,
title: document.title,
resultCount: document.querySelector('[class*=result], [class*=count]')?.innerText?.substring(0,200),
itemSelector: (() => {
// Try common patterns — find the one that matches paper cards
for (const sel of [
'[class*=result-item]', '[class*=search-result]',
'.document-item', '[class*=record]', 'article',
'.List-results-items > *', '.results > li'
]) {
const n = document.querySelectorAll(sel).length;
if (n >= 3) return sel + ' → ' + n + ' items';
}
return 'UNKNOWN — inspect manually';
})(),
sampleHTML: (() => {
const first = document.querySelector('[class*=result-item], [class*=search-result], article, [class*=record]');
return first?.innerHTML?.substring(0, 1000) || 'no match';
})(),
pagination: (() => {
const nextBtn = document.querySelector('[class*=next], [class*=pagination] a:last-child, [aria-label*=next]');
return nextBtn ? 'Found next button: ' + (nextBtn.href || nextBtn.outerHTML?.substring(0,100)) : 'No pagination found';
})(),
searchURL: (() => {
// Check if URL contains search params (easy case) or is generic (hard case)
const u = window.location.href;
if (u.includes('query') || u.includes('search') || u.includes('q=')) return 'URL-based: ' + u.substring(0,200);
return 'Form-based (may need POST) — current URL: ' + u.substring(0,200);
})()
};}
Copy extractors/ieee.js as a starting point. Three things matter: / 复制 extractors/ieee.js 作为起点。三个关键点:
// a) Item selector — from step 1 probe results
const items = document.querySelectorAll('.your-result-item-selector');
// b) Inner selectors — open browser DevTools, inspect one paper card
const title = item.querySelector('.your-title-selector');
const authors = item.querySelector('.your-author-selector');
// ...
// c) Dedup key — mandatory for every extractor
const seen = new Set(); // key on link or DOI
Testing tip · 测试技巧: Before saving, paste your extractor into browser DevTools Console and check the output. / 保存前先粘贴到浏览器 Console 验证输出。
Three common patterns — test which one works: / 三种常见模式——逐一测试:
// Pattern A: URL parameter (like IEEE: &pageNumber=3)
navigate to: baseURL + '&pageNumber=2' // or &page=2, &start=25
// Pattern B: Offset parameter (like Scopus: &offset=25)
navigate to: baseURL + '&offset=25'
// Pattern C: Next button click (for JS-heavy sites)
browser.act(kind="click", ref="next-page-button")
databases:
your_db:
name: "Database Name"
enabled: true
base_url: "https://..."
search_url: "https://.../search?query={q}"
page_param: "&page={n}" # or "&start={n}" / "click"
extractor: "extractors/your_db.js"
cookies:
required: ["SESSION_ID"]
That's it. The flow is always: probe → write extractor → find pagination → test. / 流程永远是:探路 → 写提取脚本 → 找翻页规律 → 测试。
arxiv_match.py provides free arXiv versions as workaround / 通过 arxiv_match.py 提供 arXiv 免费版替代paid-db-access/
├── SKILL.md # This file · 本文件
├── config.yaml # User config · 用户配置
├── extractors/ # DB-specific JS extractors
│ ├── ieee.js # ✅ v2 — dedup + pagination
│ ├── cnki.js # ✅ v2
│ ├── acm.js # ✅ v2
│ ├── scopus.js # ✅ v2
│ └── wos.js # ✅ v2
└── scripts/
├── cookie-extractor.py # Extract minimal cookies from browser export
└── arxiv_match.py # Match papers to arXiv free PDFs
Traditional snapshot:
Walk every DOM node → serialize to accessibility tree → return entire tree
遍历每个 DOM 节点 → 序列化成可访问性树 → 返回整棵树
Cost: ~15,000 tokens/page
This skill (evaluate):
Inject JS into page → extract only paper data → return structured JSON
注入 JS 到页面 → 只提取论文数据 → 返回结构化 JSON
Cost: ~500 tokens/page
Result: 30× token savings, 10× speed
结果: Token 节省 30 倍,速度提升 10 倍
This project is for learning and research purposes only. / 本项目仅供学习和研究目的。
cookie-extractor.py runs locally and does not upload data. Still, clear sensitive cookies after use. / cookie-extractor.py 在本地运行不会上传数据,但建议使用后清除敏感 Cookie。MIT License