Openclaw Paid Db Access

API key required
Dev Tools

Search and extract papers from paid academic databases via Browser Relay with low-token evaluate scripts. Currently fully tested on IEEE Xplore only; WoS, Scopus, ACM, and CNKI extractors are provided as templates (untested — verify selectors before use). Includes a guide for adding new databases. Use when the user asks to search for academic papers on IEEE or other paid databases behind an institutional login. Supports deduplication, multi-page extraction, and arXiv matching.

Install

openclaw skills install paid-db-access

Paid Database Access · 付费数据库访问

Let AI search paywalled academic databases through your real browser — 30× fewer tokens than snapshot, 10× faster. 让 AI 通过你的真实浏览器高效访问付费学术数据库——Token 消耗降低 30 倍,速度提升 10 倍。


Quick Verification · 快速验证

First run — verify end-to-end in 30 seconds / 首次运行 30 秒验证:

  1. Open browser → log into IEEE Xplore (institutional SSO) → click Relay icon (it turns bright) / 打开浏览器 → 登录 IEEE → 点 Relay 图标变亮
  2. Ask: "search IEEE for 'large language model scientific writing'" / 对 AI 说:「在 IEEE 搜索 large language model scientific writing」
  3. If you get structured paper list → everything works ✅ / 拿到结构化论文列表 → 一切就绪

Status · 状态: IEEE Xplore ✅ verified · CNKI 知网 ✅ verified · WoS/Scopus/ACM 📋 template ready, pending verification


Prerequisites · 前置条件

  1. Browser Relay extension installed / Browser Relay 扩展已安装 (from OpenClaw assets/chrome-extension/)
  2. Extension configured with Gateway URL + Token (from ~/.openclaw/openclaw.json) / 扩展已配置 Gateway URL + Token
  3. User logged into target database via institutional SSO / 用户已通过学校 SSO 登录目标数据库
  4. Relay icon activated on the database tab (bright/colored) / Relay 图标已激活(亮色)

Workflow · 使用流程

Step 1: Verify connection · 验证连接

browser.status → profile: "chrome", running: true, cdpReady: true

If offline / 若不在线: "Open browser → log into database → click Relay icon / 请打开浏览器 → 登录数据库 → 点击 Relay 图标激活"

Step 2: Search strategy · 搜索策略

Think before you navigate / 导航前先想好搜索策略:

Situation / 情况Action / 策略
Non-English concept (e.g. 自动生成科研论文) / 非英语概念Translate first → split into 2-3 complementary queries / 先翻译 → 拆成 2-3 个互补搜索词
Results > 500Add year/type filters or tighten query / 加过滤或收紧搜索词
Results < 5Broaden query (remove quotes, add synonyms, expand year range) / 放宽搜索词
Results = 0Switch database or try broader keywords / 换数据库或去掉引号
No good results on first tryUse simple evaluate to inspect page content, then adjust / 用简单 evaluate 探路后再调

Prefer Semantic Scholar + arXiv APIs for free/open papers first; use Browser Relay only for paywalled content, old papers, patents, citation reports, or Chinese papers. / 优先用 Semantic Scholar + arXiv API 搜免费论文;Browser Relay 仅用于付费内容、老论文、专利、引文报告、中文论文。

Step 3: Navigate to search · 导航到搜索

Construct search URL with query + filters directly — never simulate clicking the search box. / 用 URL 参数直接构造搜索——不要模拟点击搜索框。

navigate to: https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=<query>&ranges=<year1>_<year2>_Year
Database · 数据库Search URLPage param · 翻页参数
IEEE Xplore.../search/searchresult.jsp?queryText={q}&ranges={y1}_{y2}_Year&pageNumber={n}
ACM DL.../action/doSearch?AllField={q}&startPage={n}
Scopus.../results/results.uri?query={q}&offset={n}
WoS.../wos/woscc/basic-search (requires interaction · 需交互)
CNKI 知网.../kns8s/AdvSearch (requires interaction · 需交互)

Step 4: Load extractor · 加载提取脚本

Read from extractors/ — never write JS manually. / 从 extractors/ 目录读取——不要手写 JS。

read extractors/ieee.js    # IEEE Xplore
read extractors/acm.js     # ACM Digital Library
read extractors/scopus.js  # Scopus
read extractors/wos.js     # Web of Science
read extractors/cnki.js    # 中国知网

Step 5: Extract + auto-paginate · 提取 + 自动翻页

Use browser.act(kind="evaluate", fn=<script>). Scripts have built-in deduplication (by link), so count = exact paper count. / 脚本已内置去重(按 link),count 精确等于实际论文数。

browser.act(profile="chrome", targetId="<ID>", kind="evaluate", fn="<extractor content>")

Pagination logic · 翻页逻辑: The extractor returns totalPages, currentPage, perPage. If totalPages > 1:

Page 1 evaluate → { totalResults: "59", count: 25, totalPages: 3 }
  → navigate: URL + &pageNumber=2 → evaluate Page 2
  → navigate: URL + &pageNumber=3 → evaluate Page 3
  → Merge all papers (cross-page dedup by link)

If totalPages is "?", try &pageNumber=2 manually — if 404 or empty, there's only 1 page. / 如果 totalPages 是 ?,手动试 &pageNumber=2,若 404 或空则只有 1 页。

Step 6: arXiv matching · arXiv 匹配

Match extracted papers to free arXiv versions: / 匹配提取到的论文到 arXiv 免费版:

echo '<papers JSON array>' | python scripts/arxiv_match.py --delay 1.5
Papers · 论文数Strategy · 策略
≤ 10Sync match, reply together / 同步匹配,一起回复
> 10Reply with results first, match arXiv in background, append PDF links / 先回复主结果,后台异步匹配后追加 PDF 链接
  • HIGH confidence → provide PDF link / 提供 PDF 直链
  • MEDIUM confidence → provide PDF with "verify manually" warning / 提供 PDF 但标注需核实
  • LOW confidence → no PDF link, mark "verify manually" / 不提供链接
  • No match → mark "no arXiv version" / 无 arXiv 版本

Step 7: Present results · 呈现结果

Consolidate papers from all pages + databases, deduplicate globally, present to user with arXiv PDF links where available. / 汇总所有页面+数据库的论文,全局去重后呈现,附带 arXiv PDF 直链。

Token efficiency · Token 效率: evaluate ≈ 500 tokens/page vs snapshot ≈ 15,000 tokens/page — 30× saving. / evaluate 约 500 tokens/页 vs snapshot 约 15,000 tokens/页——节省 30 倍


Extractor Spec · 提取脚本规范

All extractors/*.js must follow v2 spec: built-in dedup + pagination info. / 所有提取脚本必须遵循 v2 规范:内置去重 + 翻页信息

(() => {
    const seen = new Set();   // dedup by link / 按 link 去重
    const results = [];

    // Selector priority: try most specific first, pick the one with most matches
    // 选择器策略:优先最精确的,选匹配最多的

    // ... extraction logic / 提取逻辑 ...

    return {
        totalResults: "number or '?' / 数字或'?'",
        count: results.length,
        totalPages: "number or '?' / 数字或'?'",
        currentPage: 1,
        perPage: 25,
        database: 'ieee',
        papers: results
    };
})()

Standard paper fields · 标准论文字段:

{
    "title": "Paper title · 论文标题",
    "authors": "Author1; Author2",
    "year": "2024",
    "venue": "Journal / Conference name · 期刊/会议名",
    "type": "Journal Article | Conference Paper | ...",
    "link": "Original URL · 原文链接",
    "doi": "DOI (if available · 如有)",
    "abstract": "Abstract snippet (if available · 如有)",
    "citations": "Citation count (if available · 如有)"
}

For databases without an existing extractor, first probe with simple evaluate: / 对没有提取脚本的数据库,先用简单 evaluate 探路:

() => { return {
    title: document.title,
    bodyClasses: document.body.className,
    mainSelectors: Array.from(document.querySelectorAll('h1,h2,h3')).map(h=>h.innerText).slice(0,10)
};}

Error Handling · 错误处理

Error · 错误Cause · 原因Fix · 解决方案
browser.status → running: falseRelay not activated / 未激活Click Relay icon on browser tab / 点击浏览器标签页 Relay 图标
tabs: []No attached tab / 无附加标签页Same as above / 同上
navigate returns 418Cloudflare block / 被拦截Cookie expired, re-login / Cookie 过期,重新登录
evaluate returns count: 0Selector mismatch / 选择器不匹配Probe page first: () => ({title: document.title, text: document.body.innerText.substring(0,500)}) then adjust selectors / 先用简单 JS 探路再调选择器
evaluate returns undefinedJS syntax error / 语法错误Test script in browser Console first / 先在浏览器 Console 验证
Page title contains "Sign In"Login lost / 登录失效Re-login / 提示用户重新登录
Can't reach browser control serviceGateway downRun openclaw gateway restart / 运行 openclaw gateway restart
evaluatetab not foundCDP not attached / CDP 未附加Click Relay icon on current tab / 在当前标签页点击 Relay 图标
count >> expected (e.g. 100 vs 25)Old script without dedup / 旧版未去重Use v2 extractor (with new Set()) / 确认用了 v2 脚本

Database Status · 数据库验证状态

⚠️ Honest disclosure · 诚实声明: Only IEEE Xplore has been fully tested end-to-end (search → extract → paginate → arXiv match). All other extractors are templates — written based on static page structure analysis, never run against a live logged-in search. Assume they need selector adjustments before they work. / 仅 IEEE Xplore 完成了端到端实测(检索→提取→翻页→arXiv 匹配)。其他所有数据库的提取脚本均为模板——基于静态页面结构分析编写,未在真实登录检索环境中运行过。使用前预期需要调整选择器。

Database · 数据库Search · 检索Extraction · 提取Pagination · 翻页arXivNotes · 备注
IEEE XploreOnly fully tested DB · 唯一完整测试
CNKI 知网📋📋N/ATemplate, needs campus VPN · 模板,需学校 VPN
Web of Science📋📋📋Template, untested · 模板未测试
Scopus📋📋📋Template, untested · 模板未测试
ACM DL📋📋📋Template, untested · 模板未测试

✅ = verified in live session · 📋 = template provided, needs verification


Adding a New Database · 添加新数据库

Want to add PubMed, JSTOR, ProQuest, or your university's custom repository? Here's the 4-step recipe. / 想添加 PubMed、JSTOR、ProQuest 或学校自建库?四步搞定。

1. Probe the search page · 探索搜索页

Log into the database in your browser, do a test search, then run: / 浏览器登录数据库,做一次测试搜索,然后执行:

// Paste into browser.act(kind="evaluate", fn=...) / 粘贴到 evaluate 中执行
() => { return {
    url: window.location.href,
    title: document.title,
    resultCount: document.querySelector('[class*=result], [class*=count]')?.innerText?.substring(0,200),
    itemSelector: (() => {
        // Try common patterns — find the one that matches paper cards
        for (const sel of [
            '[class*=result-item]', '[class*=search-result]',
            '.document-item', '[class*=record]', 'article',
            '.List-results-items > *', '.results > li'
        ]) {
            const n = document.querySelectorAll(sel).length;
            if (n >= 3) return sel + ' → ' + n + ' items';
        }
        return 'UNKNOWN — inspect manually';
    })(),
    sampleHTML: (() => {
        const first = document.querySelector('[class*=result-item], [class*=search-result], article, [class*=record]');
        return first?.innerHTML?.substring(0, 1000) || 'no match';
    })(),
    pagination: (() => {
        const nextBtn = document.querySelector('[class*=next], [class*=pagination] a:last-child, [aria-label*=next]');
        return nextBtn ? 'Found next button: ' + (nextBtn.href || nextBtn.outerHTML?.substring(0,100)) : 'No pagination found';
    })(),
    searchURL: (() => {
        // Check if URL contains search params (easy case) or is generic (hard case)
        const u = window.location.href;
        if (u.includes('query') || u.includes('search') || u.includes('q=')) return 'URL-based: ' + u.substring(0,200);
        return 'Form-based (may need POST) — current URL: ' + u.substring(0,200);
    })()
};}

2. Write the extractor · 编写提取脚本

Copy extractors/ieee.js as a starting point. Three things matter: / 复制 extractors/ieee.js 作为起点。三个关键点:

// a) Item selector — from step 1 probe results
const items = document.querySelectorAll('.your-result-item-selector');

// b) Inner selectors — open browser DevTools, inspect one paper card
const title = item.querySelector('.your-title-selector');
const authors = item.querySelector('.your-author-selector');
// ...

// c) Dedup key — mandatory for every extractor
const seen = new Set();  // key on link or DOI

Testing tip · 测试技巧: Before saving, paste your extractor into browser DevTools Console and check the output. / 保存前先粘贴到浏览器 Console 验证输出。

3. Find the pagination pattern · 找到翻页规律

Three common patterns — test which one works: / 三种常见模式——逐一测试:

// Pattern A: URL parameter (like IEEE: &pageNumber=3)
navigate to: baseURL + '&pageNumber=2'  // or &page=2, &start=25

// Pattern B: Offset parameter (like Scopus: &offset=25)
navigate to: baseURL + '&offset=25'

// Pattern C: Next button click (for JS-heavy sites)
browser.act(kind="click", ref="next-page-button")

4. Add to config.yaml · 写入配置

databases:
  your_db:
    name: "Database Name"
    enabled: true
    base_url: "https://..."
    search_url: "https://.../search?query={q}"
    page_param: "&page={n}"          # or "&start={n}" / "click"
    extractor: "extractors/your_db.js"
    cookies:
      required: ["SESSION_ID"]

That's it. The flow is always: probe → write extractor → find pagination → test. / 流程永远是:探路 → 写提取脚本 → 找翻页规律 → 测试


Known Limitations · 已知限制

  • Browser dependency · 浏览器依赖: User's real browser must be online and logged in / 用户真实浏览器需在线且已登录
  • Cookie expiry · Cookie 过期: Session cookies expire; re-login required / 会话 Cookie 过期需重新登录
  • Selector fragility · 选择器脆弱: Database site redesigns may break extractors / 网站改版可能导致脚本失效
  • PDF download · PDF 下载受限: Institutional-level auth often required; arxiv_match.py provides free arXiv versions as workaround / 通过 arxiv_match.py 提供 arXiv 免费版替代
  • Not all DBs verified · 未全部实测: Only IEEE and CNKI fully tested; WoS/Scopus/ACM are template-quality / 仅 IEEE 和知网完整实测;WoS/Scopus/ACM 为模板级别
  • Compliance · 合规: Respect database ToS. Don't bulk download. Don't share credentials. / 遵守数据库使用条款,不批量下载,不分享凭证

Project Structure · 项目结构

paid-db-access/
├── SKILL.md                  # This file · 本文件
├── config.yaml               # User config · 用户配置
├── extractors/               # DB-specific JS extractors
│   ├── ieee.js               # ✅ v2 — dedup + pagination
│   ├── cnki.js               # ✅ v2
│   ├── acm.js                # ✅ v2
│   ├── scopus.js             # ✅ v2
│   └── wos.js                # ✅ v2
└── scripts/
    ├── cookie-extractor.py   # Extract minimal cookies from browser export
    └── arxiv_match.py        # Match papers to arXiv free PDFs

Tech Principle · 技术原理

Traditional snapshot:
  Walk every DOM node → serialize to accessibility tree → return entire tree
  遍历每个 DOM 节点 → 序列化成可访问性树 → 返回整棵树
  Cost: ~15,000 tokens/page

This skill (evaluate):
  Inject JS into page → extract only paper data → return structured JSON
  注入 JS 到页面 → 只提取论文数据 → 返回结构化 JSON
  Cost: ~500 tokens/page

Result: 30× token savings, 10× speed
结果: Token 节省 30 倍,速度提升 10 倍

Disclaimer · 免责声明

This project is for learning and research purposes only. / 本项目仅供学习和研究目的

  • You must have legitimate institutional access (e.g. university subscription). This is not a paywall bypass tool. / 你必须有合法的机构访问权限(如学校订阅)。本项目不是绕过付费墙的工具。
  • Database websites may change at any time, breaking extractors. / 各数据库网站可能随时改版,导致提取脚本失效。
  • cookie-extractor.py runs locally and does not upload data. Still, clear sensitive cookies after use. / cookie-extractor.py 在本地运行不会上传数据,但建议使用后清除敏感 Cookie。
  • Do not bulk download papers. Do not use for commercial purposes. Do not share your access credentials. / 不要批量下载论文。不要用于商业目的。不要分享访问凭证。

MIT License