Web scraping skill using Chrome + WebMCP
Web scraping using Chrome + WebMCP. Primary method for all web crawling tasks.
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 318 · 0 current installs · 0 all-time installs
byAlex Wei@sweihub
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The skill's stated purpose (Chrome + WebMCP scraping) matches the instructions for opening pages, snapshots, and interactions. However, it assumes the ability to start and control the host Chrome instance and to enable experimental chrome://flags, yet declares no required binaries, extensions, or config paths. The omission of an explicit dependency on a controllable Chrome/browser environment is a minor coherence gap.
Instruction Scope
Runtime instructions direct the agent to run Chrome on the host (target="host"), start/stop it, open arbitrary URLs, interact with pages, and close other tabs. While necessary for scraping, these operations give the agent access to any currently open browser state (tabs, cookies, logged-in sessions) and could expose private data. The SKILL.md does not limit what browser state may be read (cookies, other tabs) nor explicitly forbid accessing user data outside the target pages.
Install Mechanism
There is no install spec and no code files; this is instruction-only, so nothing is written to disk by the skill itself. That lowers supply-chain risk. The lack of an install step means the skill relies on platform/browser capabilities to perform actions.
Credentials
The skill requests no environment variables or credentials, which is appropriate. However, it implicitly relies on the host browser and its session state (cookies/auth) — a source of sensitive data that isn't declared or constrained in the metadata.
Persistence & Privilege
always: false and standard autonomous invocation are fine. The bigger issue is the explicit recommendation to use target="host" for every task; combined with autonomous invocation this increases the potential blast radius because the agent can act in the user's interactive browser context unless the platform prevents it.
What to consider before installing
This skill instructs the agent to control your host Chrome browser (start it, open pages, interact with tabs, change chrome://flags). That means it could access cookies, logged-in sessions, and other tabs unless the platform prevents that. Before installing: confirm you trust the skill source; ensure the platform restricts host-browser operations or runs the skill in a separate browser profile or sandbox; avoid running it while sensitive sessions are open; do not give it persistent/system-wide permissions without review. If you need only static pages, prefer a sandboxed fetch/web_fetch approach. If you decide to install, test it on non-sensitive sites first and review any platform prompts that grant browser control.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
🕷️ Clawdis
SKILL.md
Spider — Web Scraping Tool
This is the default web scraping method, replacing older approaches like web_fetch.
Trigger Conditions
Use this skill when user says:
| Keywords | Action |
|---|---|
| 抓取 / crawl / scrape / fetch | Use Chrome + WebMCP to scrape web pages |
| 采集 | Same as above |
| 获取...新闻 | Scrape news pages |
| 从...网站 | Specify website to scrape |
| 同花顺 | Scrape Tonghuashun (10jqka) data |
| 东方财富 | Scrape East Money data |
| 雪球 | Scrape Xueqiu data |
| 百度 | Search or scrape Baidu content |
Usage Examples
| User Input | Execution |
|---|---|
| "抓取光库科技的新闻" | Open Tonghuashun in Chrome, extract news |
| "抓取宁德时代的股吧" | Open East Money guba in Chrome |
| "从同花顺抓取xxx" | Open Tonghuashun page in Chrome |
| "search xxx" | Open Google search in Chrome |
| "查一下xxx" | Search or scrape in Chrome |
Operation Flow
1. Check Chrome Status
{ action: "status" }
If not running, start it:
{ action: "start" }
2. Open Target Page
{ action: "open", targetUrl: "https://stockpage.10jqka.com.cn/300620/news/", target: "host" }
3. Get Page Snapshot
{ action: "snapshot", targetId: "xxx", maxChars: 20000 }
4. Page Interaction (click, type, etc.)
{ action: "act", targetId: "xxx", request: {"kind": "click", ref: "e33"} }
5. Cleanup: Return to about:blank
{ action: "navigate", targetId: "xxx", url: "about:blank" }
Common Website Templates
Tonghuashun Stock News
URL: https://stockpage.10jqka.com.cn/{stock_code}/news/
Example: https://stockpage.10jqka.com.cn/300620/news/
East Money Guba (Stock Forum)
URL: https://guba.eastmoney.com/list,{stock_code}.html
Example: https://guba.eastmoney.com/list,300620.html
Xueqiu (Snowball)
URL: https://xueqiu.com/S/SZ{stock_code}
Example: https://xueqiu.com/S/SZ300620
Baidu News Search
URL: https://www.baidu.com/s?wd={keyword}&tn=news
Chrome Setup (One-time)
- Open Chrome Flags:
chrome://flags/#enable-experimental-web-platform-features→ Enabledchrome://flags/#enable-webmcp-testing→ Enabled
- Fully quit Chrome (Cmd+Q) and restart
Important Rules
- Use target="host" instead of "sandbox"
- Must cleanup after each task:
- If multiple tabs exist, keep only one, close others
- The remaining tab must navigate to
about:blank - If multiple
about:blanktabs exist, keep only the latest one, close others - Use
browser action: tabsto check current tab status - After cleanup, ensure only one
about:blanktab remains
- Reuse existing tabs, avoid opening new tabs frequently
- Handle anti-scraping sites: Tonghuashun, East Money need complete JavaScript loading
Error Handling
| Error | Solution |
|---|---|
| Sandbox unavailable | Use target="host" |
| Slow page load | Wait for snapshot to return before操作 |
| Content extraction failed | Use snapshot's maxChars to get more content |
| Anti-scraping blocked | Try other finance sites or wait and retry |
Default Scraping Priority
-
Spider (Chrome + WebMCP) ← Primary method
- Suitable for: Finance websites, stock news, forums
- Advantages: Full JavaScript rendering, interactive
-
web_fetch ← Backup method
- Suitable for: Simple static pages
- Disadvantage: Cannot handle JavaScript-rendered pages
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
