深度内容搜索

v1.2.0

深度内容搜索工具 - 整合微信公众号、知乎、豆瓣、今日头条、百家号、微博、B站专栏等多平台内容抓取。支持获取微信公众号完整正文、知乎日报完整正文、豆瓣电影信息。支持直接解析微信链接获取全文。默认每平台3条结果，可指定条数。当用户需要深度搜索、获取文章内容或解析微信链接时使用此技能。

⭐ 0· 132·0 current·0 all-time

by@lyl340321·duplicate of @lyl340321/multi-platform-search

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lyl340321/deep-content-search.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "深度内容搜索" (lyl340321/deep-content-search) from ClawHub.
Skill page: https://clawhub.ai/lyl340321/deep-content-search
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install deep-content-search

ClawHub CLI

Package manager switcher

npx clawhub@latest install deep-content-search

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (deep content search across many Chinese platforms) matches the behavior (the code performs web scraping of WeChat and Zhihu). However: the skill claims integration with many platforms (WeChat, Zhihu, Douban, Toutiao, Baijiahao, Weibo, Bilibili) but the visible code mainly shows WechatFetcher and ZhihuFetcher; that discrepancy suggests either incomplete implementation or overstated capabilities. No homepage/source repo is provided and owner is unknown, reducing provenance.

✓

Instruction Scope

SKILL.md instructions are narrowly scoped to performing web requests, parsing HTML, and saving/printing results. It does not instruct the agent to read unrelated local files or environment secrets. The built-in retry logic and scraping strategies are explicit; the only scope concern is that retry/backoff behavior could increase request volume to third-party sites if misused.

ℹ

Install Mechanism

This is instruction-only with a Python script; dependencies are installed via pip (requests, beautifulsoup4, lxml, fake-useragent). Using PyPI packages is expected for a scraper but carries the usual moderate risk of supply-chain issues — there are no downloads from unknown servers or embedded binary installers.

✓

Credentials

No environment variables or credentials are requested. For public scraping this is proportionate. There are no obvious attempts to read other credentials or config paths in the SKILL.md or the visible code.

✓

Persistence & Privilege

Skill is not always-enabled and does not request elevated system persistence. It runs on-demand and does not modify other skills or system-wide settings in the provided material.

What to consider before installing

This skill is a web-scraper that will make outbound requests to third-party sites (WeChat, Zhihu, etc.). Before installing: 1) Review the full deep_search.py for any hardcoded telemetry or unexpected endpoints (the repository/source is missing). 2) Note the README claims many platform integrations — verify the code actually implements those platforms. 3) Consider legal and terms-of-service issues for scraping copyrighted content and respect robots.txt/rate limits; the tool includes retries which can increase load on target sites. 4) Run the code in a sandboxed environment and avoid providing credentials (none are required). If you need broader trust, ask the author for a source repo or signed provenance and confirm which platforms are fully implemented.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🔍 Clawdis

Binspython3

latestvk97b4j9gdbt7hjw6xg52madjjx84ttcg

132downloads

0stars

3versions

Updated 1w ago

v1.2.0

MIT-0

深度内容搜索工具 🔍

整合8大平台内容搜索，支持获取完整正文。

支持平台

平台	内容完整性	获取方式	说明
微信公众号	✅ 完整正文	搜狗微信搜索	约5000-10000字完整文章
知乎日报	✅ 完整正文	知乎日报API	约5000-10000字完整文章
豆瓣电影	✅ 电影信息+评分	豆瓣官方API	标题、评分、链接
知乎问答	⚠️ 摘要	搜狗搜索	约150字摘要+真实链接
今日头条	⚠️ 摘要	360搜索	约150字摘要+链接
百家号	⚠️ 摘要	360搜索	约150字摘要+链接
微博	⚠️ 摘要+用户	360搜索	约150字摘要+用户名
B站专栏	⚠️ 摘要	360搜索	约150字摘要+链接

无法支持：小红书（强反爬机制）

快速开始

# 综合搜索（所有平台），默认每平台3条
python3 scripts/deep_search.py "大模型LLM"

# 指定结果数量
python3 scripts/deep_search.py "AI" --limit 5

# 单平台搜索
python3 scripts/deep_search.py "教程" --source wechat
python3 scripts/deep_search.py "如何学习" --source zhihu
python3 scripts/deep_search.py "人工智能" --source douban
python3 scripts/deep_search.py "大模型" --source toutiao
python3 scripts/deep_search.py "AI" --source baijiahao
python3 scripts/deep_search.py "OpenClaw" --source weibo
python3 scripts/deep_search.py "教程" --source bilibili

# 直接解析微信链接（自动检测）
python3 scripts/deep_search.py "https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"

# JSON输出
python3 scripts/deep_search.py "OpenClaw" --json

# 保存到文件
python3 scripts/deep_search.py "OpenClaw" -o result.md

# 指定微信公众号名称
python3 scripts/deep_search.py "教程" --source wechat --account "软件小技"

命令行参数

参数	简写	说明	默认值
`keyword`	-	搜索关键词（必需）	-
`--source`	`-s`	平台：wechat/zhihu/douban/toutiao/baijiahao/weibo/bilibili/all	all
`--limit`	`-l`	每平台结果数量	3
`--json`	`-j`	输出JSON格式	-
`--output`	`-o`	输出文件路径	-
`--account`	`-a`	微信公众号名称（仅wechat）	-

输出格式

文本格式

======================================================================
搜索关键词: 大模型LLM
共找到 13 条内容
======================================================================

📱【微信公众号】(3条)
----------------------------------------------------------------------

[1] 【大模型LLM第二篇】openai官方prompt教程详细解读
    公众号: AI蜗牛车
    正文: 前言毕竟openai是大模型的鼻祖，官方推荐的prompt教程...
    链接: https://mp.weixin.qq.com/s?...

📘【知乎】(2条)
----------------------------------------------------------------------

[1] 一文搞懂LLM大模型！LLM从入门到精通万字长文
    来源: 知乎日报 | daily
    统计: 知乎用户 · 10 分钟阅读
    [✓ 完整正文 9827字]

📰【今日头条】(2条)
----------------------------------------------------------------------

[1] 初学者怎么入门大语言模型(LLM)?
    链接: https://www.toutiao.com/...

======================================================================
✅ 搜索完成

JSON格式

{
  "keyword": "大模型LLM",
  "total": 13,
  "results": [
    {
      "platform": "wechat",
      "title": "文章标题",
      "author": "公众号名称",
      "content": "完整正文...",
      "url": "https://mp.weixin.qq.com/s?...",
      "publish_time": ""
    },
    {
      "platform": "zhihu",
      "title": "文章标题",
      "source": "知乎日报",
      "content": "完整正文...",
      "link": "https://zhuanlan.zhihu.com/p/...",
      "type": "daily"
    }
  ]
}

智能重试机制 ⚡

微信公众号自动重试

搜狗微信搜索存在反爬机制，可能临时返回空结果。本工具已内置智能重试机制：

第一次搜索 → 未找到结果 → 等待10秒 → 自动重试 → 返回结果或报告失败

工作机制：

搜索微信公众号，若返回空结果
自动等待10秒（让反爬机制冷却）
重新发起搜索请求
若重试成功，正常返回结果
若重试仍失败，提示"重试后仍未找到"

示例输出：

正在搜索微信公众号: 伊朗谈判进展
未找到微信文章，10秒后自动重试...
重试搜索微信公众号: 伊朗谈判进展
[1/3] 获取: 美伊谈判取得巨大进展...

微信链接直接解析 🔗

功能说明

支持直接输入微信公众号文章链接，获取完整标题和正文：

# 直接解析微信链接
python3 scripts/deep_search.py "https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"

# JSON格式输出
python3 scripts/deep_search.py "https://mp.weixin.qq.com/s?__biz=xxx" --json

# 保存到文件
python3 scripts/deep_search.py "https://mp.weixin.qq.com/s?__biz=xxx" -o article.txt

输出格式

文本格式：

======================================================================
📱 微信公众号文章
======================================================================
标题: 文章标题
公众号: 公众号名称
发布时间: 2026-04-14
字数: 5000
链接: https://mp.weixin.qq.com/s?...

----------------------------------------------------------------------
正文内容:
----------------------------------------------------------------------
[完整正文内容]

======================================================================
✅ 获取完成

JSON格式：

{
  "platform": "wechat",
  "title": "文章标题",
  "author": "公众号名称",
  "content": "完整正文...",
  "url": "https://mp.weixin.qq.com/s?...",
  "publish_time": "2026-04-14",
  "word_count": 5000
}

技术原理

检测微信链接 → 直接请求HTML → BeautifulSoup解析 → 提取标题/公众号/正文

解析目标元素：

#activity-name → 标题
#js_name → 公众号名称
#publish_time → 发布时间
#js_content → 正文内容

注意事项

链接必须是有效的微信文章链接（mp.weixin.qq.com/s?...）
部分文章可能因访问限制无法获取
获取的内容仅包含纯文本，图片和格式需另行处理

内容完整性说明

✅ 可获取完整正文的平台

平台	字数	技术原理
微信公众号	5000-10000字	搜狗微信搜索 → 提取真实微信链接 → 获取完整HTML正文
微信链接直接解析	5000-10000字	直接请求微信链接 → 解析HTML获取标题+正文
知乎日报	5000-10000字	知乎日报公开API（news-at.zhihu.com）直接返回完整正文
豆瓣电影	电影信息	豆瓣电影API（movie.douban.com）返回标题+评分+链接

⚠️ 仅能获取摘要的平台

平台	原因
知乎问答/专栏	zse-ck反爬验证，未登录返回403
今日头条	安全验证机制，返回简化页面
百家号	百度安全验证拦截
微博	API返回432错误，需要登录
B站专栏	返回3KB简化页面（正常30KB+），内容被隐藏

解决方案：这些平台提供摘要和真实链接，用户可手动点击链接查看完整内容。

知乎获取策略

知乎有两种获取方式：

知乎日报（优先）
- 搜索最近30天日报历史
- 匹配关键词获取完整正文（约5000-10000字）
- 通过公开API：news-at.zhihu.com/api/4/news/{id}
知乎问答/专栏（补充）
- 当日报未收录时使用搜狗搜索
- 获取摘要（约150字）+ 真实链接
- 用户可手动访问完整内容

技术原理

微信公众号抓取

搜狗微信搜索 → 获取中间链接 → 从JS重定向提取真实微信链接 → 访问微信文章获取完整正文

关键技巧：微信链接被分段存储在 url += 'xxx' 语句中，需要拼接获取真实链接。

知乎日报抓取

知乎日报API → 直接返回JSON数据 → 包含完整正文HTML

API公开可访问，无反爬限制。

豆瓣电影抓取

豆瓣电影API → 返回电影列表 → 包含标题、评分、链接

API：movie.douban.com/j/search_subjects

其他平台抓取

360/Bing搜索 → 提取搜索结果摘要 → 提供真实链接

因各平台反爬机制，只能获取搜索引擎返回的摘要内容。

引用规范 ⚠️【必须遵守】

AI 使用本工具搜索结果时，必须遵守以下规范：

1. 输出格式要求

搜索完成后，AI 向用户输出信息时：

必须包含：

各渠道条数汇总表（工具已自动生成）
每条信息的来源标注

2. 来源标注格式

平台	标注格式	示例
微信公众号	`来源：公众号「名称」`	来源：公众号「潇湘晨报」
知乎日报	`来源：知乎日报`	来源：知乎日报
知乎问答	`来源：知乎 + 链接`	来源：知乎（链接）
微博	`来源：微博 @用户名`	来源：微博 @CCTV国际时讯
今日头条	`来源：今日头条 + 链接`	来源：今日头条
百家号	`来源：百家号 + 链接`	来源：百家号
B站专栏	`来源：B站专栏 + 链接`	来源：B站专栏
豆瓣	`来源：豆瓣`	来源：豆瓣

3. 正确示例

### 伊朗谈判最新进展

**美伊各说各话**

1. 特朗普称谈判取得重大进展
   来源：公众号「柚地理」

2. CIA前局长更相信伊朗说法
   来源：公众号「潇湘晨报」

---

📊 各渠道检索结果:
  📱 微信公众号: ✅ 3条
  📘 知乎: ✅ 5条
  💬 微博: ❌ 0条

4. 错误示例

❌ 伊朗外长表示谈判取得进展。（没说来源）
❌ 据媒体报道，美伊正在谈判。（模糊来源）
❌ 网上有人说...（无来源）

5. 来源可信度排序

等级	来源	可信度
⭐⭐⭐	央视/凤凰等官方媒体、公众号大号	最高
⭐⭐	知乎日报完整正文、知名公众号	较高
⭐	知乎问答摘要、微博普通用户	需核实
⚠️	今日头条、百家号摘要	可能标题党

6. 特殊情况处理

多条同类信息：合并引用，如「来源：公众号「潇湘晨报」「极目新闻」」
信息冲突：明确指出，如「公众号A称...，但公众号B否认」
时间敏感：标注时间，如「来源：公众号「XX」（2026年3月26日）」

注意事项

频率限制
- 搜索引擎有反爬机制，请求自动间隔1秒
- 360搜索短时间大量请求会触发验证码
合法合规
- 仅供学习研究使用
- 请尊重原作者版权
- 不要用于商业用途
网络要求
- 需要访问：weixin.sogou.com、mp.weixin.qq.com、news-at.zhihu.com、movie.douban.com、so.com
依赖安装

pip install requests beautifulsoup4 lxml fake-useragent

文件结构

deep-search/
├── SKILL.md              # 技能说明文档
└── scripts/
    └── deep_search.py    # 主脚本（约1500行）

Made with ❤️ by OpenClaw Community

Comments

Loading comments...