Lobster Crawler Skill
v0.7.0定向抓取 Webnovel/ReelShorts 等站点的书籍/短剧内容,支持内容分级与钉钉播报。
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
Name/description (crawler + DingTalk broadcast) align with most code (scrapy spiders, RSS, broadcast module). However, the declared required binary 'uv' and the install block are incoherent (install declares kind: uv with package 'curl_cffi' and bins: ['uv'] — installing curl_cffi would not create an 'uv' binary). The repo also bundles an LLM loop (scripts/claude_loop.sh, prompts/claude_loop_prompt.txt, CLAUDE.md) which is not strictly necessary for a crawler; that increases the runtime footprint beyond the stated simple crawler+broadcast purpose.
Instruction Scope
SKILL.md runtime instructions focus on using 'uv run' to run the CLI (crawl/list/status/broadcast/rss) which is coherent. But repository docs (CLAUDE.md, agent.md, scripts/claude_loop.sh and prompts) instruct an agent to run continuous LLM loops, to read and update repo docs and to persist project memory into ~/.claude/projects/... — that asks the agent to write to a global home-path and to run an external 'claude' binary. Those behaviours (writing to user home, running an LLM loop) go beyond crawling and are not declared in SKILL.md.
Install Mechanism
Declared install block is inconsistent: kind: uv package: 'curl_cffi' bins: ['uv'] — this does not make sense (curl_cffi is a Python library, not an installer that yields an 'uv' binary). The SKILL.md uses 'uv venv' and 'uv run', implying a dependency on a tool named 'uv' but the install metadata doesn't install that tool. The repo otherwise uses standard Python dependencies via requirements.txt (pip). This mismatch suggests either broken install metadata or sloppy packaging; treat automatic install as risky.
Credentials
Registry declares a single required env var DINGTALK_WEBHOOK (primaryEnv) which matches the broadcast feature. However code (src/broadcast/dingtalk.py) also reads DINGTALK_SECRET for HMAC signing, but that variable is not declared in requires.env. Additional optional envs appear in config logic (DB_PATH, LOG_LEVEL) and .env references in docker-compose. The skill also includes scripts and docs that reference ~/.claude memory paths and require an external 'claude' CLI — those introduce implicit credentials/configuration and external network usage that are not declared.
Persistence & Privilege
The skill is not marked always:true (good). Nevertheless repository docs instruct agents to persist memory into a global ~/.claude/projects/ path and scripts/claude_loop.sh create .claude/out and .claude/logs inside repo and call the 'claude' binary. The combination of an LLM loop, automated write-to-home instructions, and webhook broadcasting increases blast radius if run autonomously. This behavior is not explained in the high-level SKILL.md and is outside the crawler's minimal needs.
What to consider before installing
This package appears to be a functioning crawler + DingTalk broadcaster, but I found several red flags you should address before installing or running it:
- Install metadata mismatch: the skill requires a 'uv' CLI but the install block lists package 'curl_cffi' and claims it will create 'uv' — that is inconsistent. Do not run any automatic 'install' step until this is clarified. Prefer creating a Python venv and running 'pip install -r requirements.txt' yourself in an isolated environment.
- Environment variables: you must supply DINGTALK_WEBHOOK for broadcasts; the code also reads DINGTALK_SECRET (for signed webhooks) but that is not declared. If you supply a secret, ensure it's the intended value. Review any .env files before use.
- Hidden agent/LLM behavior: repository docs and scripts instruct running an LLM loop (claude_loop.sh), and to persist agent memory under ~/.claude/projects/... — these actions are unrelated to simple crawling and grant the project the ability to read/write outside the repo and to repeatedly invoke an LLM. Only run these parts if you trust the publisher and understand what will be written and sent.
- Run in isolation: test in a disposable environment (container or VM), with network restricted if necessary. Inspect and, if needed, remove or disable scripts/claude_loop.sh and CLAUDE.md steps that write to your home directory before allowing autonomous runs.
- Verify robots/ethics: review target sites' robots.txt and legal terms — the repository itself has conflicting notes about obeying robots.txt.
If you want, I can list the exact files and lines that reference the problematic install block, DINGTALK_SECRET usage, and the ~/.claude memory writes so you can inspect them before proceeding.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
🦞 Clawdis
OSmacOS · Linux
Any binuv
EnvDINGTALK_WEBHOOK
Primary envDINGTALK_WEBHOOK
Install
uv
Bins: uv
uv tool install curl_cffilatest
龙虾爬虫技能
定向抓取 Webnovel 小说和 ReelShorts 短剧的结构化内容,支持增量更新、内容分级(高/中/低)和钉钉机器人播报。
环境初始化
首次使用前,在技能目录下初始化 Python 环境:
cd {{skillPath}}
uv venv .venv
uv pip install -r requirements.txt
不需要安装浏览器。反爬通过 curl_cffi TLS 指纹伪装实现,纯 Python 库,无系统依赖。
后续所有命令都通过 uv run 执行,它会自动激活 .venv 虚拟环境。
触发条件
当用户消息包含以下意图时激活此技能:
- 抓取/爬取小说、短剧、webnovel、reelshorts 内容
- 查看爬虫状态、已抓取作品列表
- 播报抓取结果到钉钉
- 生成 RSS 订阅源
- 管理定时抓取任务
命令
所有命令必须在技能目录下执行。先 cd {{skillPath}},再运行命令。
抓取内容
uv run python -m src.cli crawl <spider_name>
spider_name可选值:webnovel(小说)、reelshorts(短剧)- 支持传递爬虫参数:
uv run python -m src.cli crawl webnovel -a max_pages=5
列出已抓取作品
uv run python -m src.cli list [--site <site>] [--grade <grade>] [--limit <n>]
--site:按站点过滤(webnovel / reelshorts)--grade:按分级过滤(high / medium / low)--limit:显示数量,默认 20
查看系统状态
uv run python -m src.cli status
返回数据库统计(作品数、章节数、剧集数)和各分级数量。
播报到钉钉
uv run python -m src.cli broadcast [--site <site>] [--grade <grade>] [--title <title>]
生成 Markdown 消息并发送到钉钉群。需要设置环境变量 DINGTALK_WEBHOOK。
管理定时任务
uv run python -m src.cli schedule --action=list # 查看任务
uv run python -m src.cli schedule --action=load # 从配置加载
uv run python -m src.cli schedule --action=start # 启动调度器
生成 RSS 订阅源
uv run python -m src.cli rss [--format rss|atom] [--output <path>] [--site <site>] [--grade <grade>]
默认输出到 data/rss.xml。
规则
- 首次使用前,必须先运行"环境初始化"步骤安装依赖。如果
uv run报错找不到模块,重新执行初始化。 - 运行爬虫前,先执行
status确认系统正常。 - 用户未指定站点时,询问要抓取 webnovel 还是 reelshorts。
- 播报前先用
list确认有数据可播报。 - 钉钉播报需要确认
DINGTALK_WEBHOOK环境变量已配置。 - 抓取可能耗时较长,提前告知用户并在完成后汇报结果。
- 不要同时运行多个爬虫实例,避免并发冲突。
Comments
Loading comments...
