{"skill":{"slug":"web-fetcher","displayName":"Web Fetcher","summary":"Smart web content fetcher - articles and videos from WeChat, Feishu, Bilibili, Zhihu, Toutiao, YouTube, etc. Triggers: '抓取文章', '下载网页', '保存文章', 'fetch URL', '...","description":"---\nname: web-fetcher\ndescription: \"Smart web content fetcher - articles and videos from WeChat, Feishu, Bilibili, Zhihu, Toutiao, YouTube, etc. Triggers: '抓取文章', '下载网页', '保存文章', 'fetch URL', '下载视频', '抓取飞书文档', '抓取微信文章', '把这个链接内容保存下来', '下载B站视频', 'download video', 'scrape article'.\"\nversion: 0.1.1\nlicense: Complete terms in LICENSE\n---\n\n# Web Fetcher\n\nSmart web content fetcher for Claude Code. Automatically detects platform and uses the best strategy to fetch articles or download videos.\n\n## Quick Start\n\n```bash\n# Fetch an article\npython3 {SKILL_DIR}/fetcher.py \"URL\" -o ~/docs/\n\n# Download a video\npython3 {SKILL_DIR}/fetcher.py \"https://b23.tv/xxx\" -o ~/videos/\n\n# Batch fetch from file\npython3 {SKILL_DIR}/fetcher.py --urls-file urls.txt -o ~/docs/\n```\n\n## Install Dependencies\n\nInstall only what you need — dependencies are checked at runtime:\n\n| Dependency | Purpose | Install |\n|-----------|---------|---------|\n| scrapling | Article fetching (HTTP + browser) | `pip install scrapling` |\n| yt-dlp | Video download | `pip install yt-dlp` |\n| camoufox | Anti-detection browser (Xiaohongshu, Weibo) | `pip install camoufox && python3 -m camoufox fetch` |\n| html2text | HTML to Markdown conversion | `pip install html2text` |\n\n## Smart Routing\n\nThe fetcher automatically detects the platform from the URL:\n\n| Platform | Method | Notes |\n|----------|--------|-------|\n| mp.weixin.qq.com | scrapling | Extracts `data-src` images, handles SVG placeholders |\n| *.feishu.cn | Virtual scroll | Collects all blocks via scrolling, downloads images with cookies |\n| zhuanlan.zhihu.com | scrapling | `.Post-RichText` selector |\n| www.zhihu.com | scrapling | `.RichContent` selector |\n| www.toutiao.com | scrapling | Handles `toutiaoimg.com` base64 placeholders |\n| www.xiaohongshu.com | camoufox | Anti-bot protection requires stealth browser |\n| www.weibo.com | camoufox | Anti-bot protection requires stealth browser |\n| bilibili.com / b23.tv | yt-dlp | Video download, supports quality selection |\n| youtube.com / youtu.be | yt-dlp | Video download |\n| douyin.com | yt-dlp | Video download |\n| Unknown URLs | scrapling | Generic fetch with fallback tiers |\n\n## CLI Reference\n\n```\npython3 {SKILL_DIR}/fetcher.py [URL] [OPTIONS]\n\nArguments:\n  url                    URL to fetch\n\nOptions:\n  -o, --output DIR       Output directory (default: current)\n  -q, --quality N        Video quality, e.g. 1080, 720 (default: 1080)\n  --method METHOD        Force method: scrapling, camoufox, ytdlp, feishu\n  --selector CSS         Force CSS selector for content extraction\n  --urls-file FILE       File with URLs (one per line, # for comments)\n  --audio-only           Extract audio only (video downloads)\n  --no-images            Skip image download (articles)\n  --cookies-browser NAME Browser for cookies (e.g., chrome, firefox)\n```\n\n## Platform Notes\n\n### WeChat (mp.weixin.qq.com)\n- Images use `data-src` attribute with `mmbiz.qpic.cn` URLs\n- Visible `<img>` tags contain SVG placeholders (lazy loading)\n- Image download requires `Referer: https://mp.weixin.qq.com/` header\n- Scrapling GET usually works; no browser needed\n\n### Feishu (*.feishu.cn)\n- Uses virtual scroll — content blocks are rendered on-demand\n- The fetcher scrolls through the entire document, collecting `[data-block-id]` elements\n- Images require authenticated fetch (cookies), downloaded via browser's fetch API\n- May show \"Unable to print\" artifacts which are auto-cleaned\n\n### Bilibili\n- Short links (b23.tv) are auto-resolved\n- For premium/member content, use `--cookies-browser chrome`\n- Default quality is 1080p, adjustable with `-q`\n\n## Troubleshooting\n\n| Problem | Solution |\n|---------|----------|\n| `scrapling not found` | `pip install scrapling` |\n| `yt-dlp not found` | `pip install yt-dlp` |\n| Article content too short | Try `--method camoufox` for JS-heavy pages |\n| Feishu returns login page | The doc may require authentication |\n| Bilibili 403 | Use `--cookies-browser chrome` |\n| Image download fails | Check network; WeChat images need Referer header (auto-handled) |\n\n## Manual Usage\n\nWhen the CLI doesn't fit your needs, use the modules directly:\n\n```python\nfrom lib.router import route, check_dependency\nfrom lib.article import fetch_article\nfrom lib.video import fetch_video\nfrom lib.feishu import fetch_feishu\n\n# Route a URL\nr = route(\"https://mp.weixin.qq.com/s/xxx\")\n# {'type': 'article', 'method': 'scrapling', 'selector': '#js_content', 'post': 'wx_images'}\n\n# Fetch article\nfetch_article(url, output_dir=\"/tmp/out\", route_config=r)\n\n# Download video\nfetch_video(url, output_dir=\"/tmp/out\", quality=\"720\")\n\n# Fetch Feishu doc\nfetch_feishu(url, output_dir=\"/tmp/out\")\n```\n","topics":["Feishu","WeChat"],"tags":{"latest":"0.1.1"},"stats":{"comments":0,"downloads":1040,"installsAllTime":39,"installsCurrent":4,"stars":0,"versions":1},"createdAt":1773119607957,"updatedAt":1778997227255},"latestVersion":{"version":"0.1.1","createdAt":1773119607957,"changelog":"Initial release: smart web content fetcher for articles and videos","license":"MIT-0"},"metadata":null,"owner":{"handle":"alexxxiong","userId":"s17e7cgyesfsxqax7wb080fe1n8857ak","displayName":"alexxiong","image":"https://avatars.githubusercontent.com/u/45865830?v=4"},"moderation":null}