AlphaPai 评论抓取

v0.2.0

⭐ 0· 204·0 current·0 all-time

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for clawdbotrr/alphapai-scraper.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "AlphaPai 评论抓取" (clawdbotrr/alphapai-scraper) from ClawHub.
Skill page: https://clawhub.ai/clawdbotrr/alphapai-scraper
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install alphapai-scraper

ClawHub CLI

Package manager switcher

npx clawhub@latest install alphapai-scraper

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (scrape AlphaPai, index and summarize comments) aligns with the code: Playwright-based scraping, SQLite+FTS5 indexing, optional local vector index and Feishu posting. However, the package declares no required env vars/binaries while the SKILL.md and code repeatedly reference USER_AUTH_TOKEN, cookies files, local Chrome profile storage_state and rely on local CLIs (openclaw, clawhub). This omission (no declared credentials/deps) is an incoherence the user should be aware of.

ℹ

Instruction Scope

Runtime instructions explicitly direct the agent/user to read/save tokens, cookies, storage_state, and optionally reuse the local Chrome Profile; these are sensitive but consistent with a site-login scraper. The skill also offers a bootstrap routine that opens a real browser and saves storage state/cookies. There is no instruction to exfiltrate secrets, but the feature to send summaries to an external Feishu webhook (if configured) means collected content could be transmitted externally — the webhook is optional and disabled by default.

Install Mechanism

The registry lists no install spec, yet the bundle includes many runnable Python scripts that import Playwright, chromadb, sentence_transformers/torch and call local CLIs (openclaw, clawhub). There is no packaged dependency list or guidance in SKILL.md about installing these third-party libraries or the CLIs. That mismatch (runnable code without declared install steps) increases the risk of runtime surprises and hidden dependency installation.

Credentials

Although the skill doesn't declare required env vars in metadata, the code and SKILL.md expect/encourage providing sensitive auth material: USER_AUTH_TOKEN (env or token file), cookies.json, account username/password, storage_state, and even access to a local Chrome profile. Those are proportionate to a login-based scraper but are sensitive; the skill also supports configuring a Feishu webhook that would send summaries off-machine. The lack of declared required-env metadata and omission of explicit warnings in metadata is a red flag.

✓

Persistence & Privilege

The skill does not request always:true nor modify other skills. It writes archives, indexes, and runtime metadata to a local directory (~/.openclaw/data/alphapai-scraper by default) and can produce a sanitized dist for publishing. Allowing autonomous invocation is enabled in the agent interface metadata (allow_implicit_invocation), which is normal; combine this with the above credential access only if you are concerned about autonomous scraping of protected accounts.

What to consider before installing

What to consider before installing: - This bundle contains runnable Python scripts (Playwright scraping, SQLite + optional Chroma/transformer vector steps) but the registry entry lists no install steps or dependencies. Expect to need to install Playwright, chromadb/Chroma client, sentence-transformers (and possibly torch), and have the 'openclaw' / 'clawhub' CLIs available. Ask the publisher for an explicit requirements/install list or run it in an isolated VM/venv. - The skill asks for/uses sensitive auth artifacts: USER_AUTH_TOKEN (env or token file), cookies, username/password, storage_state, or direct access to your Chrome profile. These are necessary for automated login, but only provide them if you trust the code and are comfortable with those credentials being used locally. Prefer using short-lived tokens or manual bootstrap rather than handing over full browser profiles. - Feishu webhook support will send summaries to an external endpoint if enabled. Ensure webhook_url is correct and intentionally configured; keep feishu.enabled=false if you do not want external transmission. - The package writes archives and indexes to ~/.openclaw/data/alphapai-scraper by default. Review/relocate that path if you prefer a sandboxed location. - Because code is included, inspect setup.sh (provided) and the import/use of Playwright and model-loading logic before running. If you plan to publish or run this on shared systems, run it in an isolated environment and review the package_skill.py behavior so you do not accidentally publish secrets. - If you want to proceed safely: request from the author a clear requirements.txt / install instructions and a justification for any external CLIs required, or run the skill in a disposable container/machine after auditing the scripts.

Like a lobster shell, security has layers — review code before you run it.

latestvk97098aqgb2p7hps8dby5ad96x833crk

204downloads

0stars

3versions

Updated 1h ago

v0.2.0

MIT-0

AlphaPai Scraper

这个 skill 现在包含两类能力：

抓取 Alpha派最近 N 小时点评，保存原文、结构化记录、摘要
查询已经归档的 Alpha派点评库，按主题和时间窗口生成检索摘要

何时使用

用户要抓取 Alpha派最近 1 小时或最近 N 小时点评
用户要自动登录 Alpha派并复用 token / cookies / 账号密码
用户要把原文归档成可检索的本地索引
用户要问“最近一周关于英伟达的所有点评”这类历史查询
用户要把摘要发回飞书
用户要把这个 skill 打包成可迁移、可发布的版本

默认规则

如果用户没有指定时间窗口，默认抓取最近 1 小时
如果用户明确说“抓最近 3 小时”，运行时传 --hours 3
如果用户要查询历史点评库，默认查最近 7 天
原文、结构化记录、索引库、摘要默认都保存到 ~/.openclaw/data/alphapai-scraper
飞书发送默认关闭，只有配置了 webhook 才发送

认证优先级

优先按下面顺序尝试，成功一个就继续：

已缓存 storage state
USER_AUTH_TOKEN
cookies.json
账号密码
本机 Chrome Profile

如果目的是“最稳且最可迁移”，优先向用户要 USER_AUTH_TOKEN。如果 token 没有，再要 cookies.json。账号密码方案留作最后，因为可能遇到验证码或页面变更。如果用户愿意做一次人工登录引导，也可以运行 scripts/bootstrap_session.py 先缓存会话，后续任务直接复用。

首次配置

优先只读以下文件，不要把示例文件整段贴回对话：

config/settings.example.json
config/token.example.json
config/cookies.example.json
config/credentials.example.json

首次使用时，让用户把示例文件复制为本地文件并填写：

config/settings.local.json
config/token.local.json
config/cookies.local.json
config/credentials.local.json

已有旧版 config/token.json 时，脚本也会兼容读取。如果想快速初始化，也可以直接运行 scripts/init_config.py 生成 settings.local.json。

运行方式

标准抓取：

python3 /Users/bot/.openclaw/workspace/skills/alphapai-scraper/scripts/run.py --hours 1

查询最近 7 天关于英伟达的点评：

python3 /Users/bot/.openclaw/workspace/skills/alphapai-scraper/scripts/run.py --query 英伟达 --days 7

如果用户明确想只走向量模糊召回：

python3 /Users/bot/.openclaw/workspace/skills/alphapai-scraper/scripts/run.py --query 英伟达 --days 7 --query-mode vector

如果想看浏览器过程，追加：

--headed

如果只要文件，不发飞书，追加：

--skip-feishu

抓取策略

浏览器启动优先顺序：

Playwright 无状态浏览器
本机 Chrome Profile 兜底

内容提取优先顺序：

点击条目抓弹窗正文
打开详情链接抓正文
回退到卡片正文

输出

原文：<output.base_dir>/raw/YYYYMMDD_HHMMSS.md|txt
结构化：<output.base_dir>/normalized/YYYYMMDD_HHMMSS.json
索引库：<output.base_dir>/index/alphapai.sqlite
向量索引：<output.base_dir>/index/vector/
摘要：<output.base_dir>/reports/YYYYMMDD_HHMMSS_summary.md|txt
查询摘要：<output.base_dir>/reports/YYYYMMDD_HHMMSS_query_summary.md
运行元数据：<output.base_dir>/runtime/*.json

查询规则

默认使用 hybrid 模式，合并 SQLite + FTS5 精确检索和本地 Chroma 向量召回
如果用户明确要“只精确搜”或“只模糊搜”，可以分别传 --query-mode exact 或 --query-mode vector
会先按最近 N 天过滤，再对标题和正文做全文检索，并补充向量召回
内置少量实体别名，例如 英伟达 / NVIDIA / NVDA / Blackwell / GB200
如果没有命中，固定返回：alphapai最近N天没有相关点评

飞书

如果 feishu.enabled=true 且配置了 webhook_url，脚本会自动发送抓取摘要或查询摘要。如果没有 webhook，只保留本地文件。

打包与发布

发布前不要直接上传带有真实 token/cookies 的技能目录。

先执行：

python3 /Users/bot/.openclaw/workspace/skills/alphapai-scraper/scripts/package_skill.py

这会生成一个去敏后的可发布副本，默认输出到：

/Users/bot/.openclaw/workspace/skills/dist/alphapai-scraper

后续如果用户确认已经登录 ClawHub，再用这个去敏副本发布。如果本机已经安装并登录 ClawHub，也可以直接运行 scripts/publish_skill.py 一键发布。

Comments

Loading comments...