soushen
v1.0.0高性能 Bing 搜索引擎 Skill - "搜神猎手" 使用 Playwright 底层 API 进行深度网页搜索和元素提取 功能: 1. Bing 搜索执行 - 返回结构化搜索结果(标题、链接、摘要、来源) 2. 深度页面分析 - 提取页面的所有关键元素(链接、表单、按钮、脚本、元数据) 触发条件: - 用户...
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
Name/description promise Bing search + deep page analysis which matches the included code. However, the implementation collects browser cookies and uses aggressive browser launch flags (e.g. --disable-web-security, --no-sandbox) and anti-detection measures; cookies are not mentioned in the public SKILL.md/README even though the code returns them, creating a mismatch between stated purpose and actual data collected.
Instruction Scope
SKILL.md instructs running the scripts for search and deep analysis and claims it extracts links/forms/buttons/meta/scripts. The runtime code additionally captures cookies (self.page.context.cookies()) and applies anti-detection behavior. The instructions do not warn that cookies or other session state will be extracted and output as JSON; extracting cookies expands the scope to potentially sensitive authentication data.
Install Mechanism
No install spec provided (instruction-only) and dependencies are standard for a Playwright-based scraper (pip install playwright + Chrome). No downloads from untrusted URLs or archive extraction are present in the package metadata.
Credentials
The skill declares no required environment variables; the code does read CHROME_PATH / CHROME_BIN to locate Chrome which is expected. No other environment or credential access is requested. The main concern is not env vars but the collection of cookies/session state which is sensitive despite no explicit credential variables.
Persistence & Privilege
always is false and the skill does not request persistent system-wide privileges. It does not attempt to modify other skills or agent settings in the provided files.
What to consider before installing
This skill implements exactly the kind of browser automation you would expect for a scraper, but be cautious: the code collects page cookies and session state and uses anti-detection flags that can make scraping stealthier. Before installing or running it, consider the following:
- Do not run this on a machine where you are logged into sensitive accounts (Google, Microsoft, banking, etc.). The script collects cookies which could include authentication tokens.
- Review the rest of scripts/bing_search.py (the file listing was truncated) to confirm there is no code that transmits extracted cookies or other data to any external server. If the code prints JSON to stdout, be careful where you direct that output.
- Run the skill in an isolated environment (container or VM) and limit network access if you want to prevent accidental exfiltration.
- If you only need link/form/script metadata, consider editing the code to remove cookies from the PageElements output (delete or sanitize cookies before returning/printing).
- The anti-detection flags (e.g. --disable-web-security, AutomationControlled evasion) are legitimate for scraping but increase privacy/ethical concerns; ensure usage complies with target site terms of service.
If you can provide the remainder of bing_search.py (the truncated portion), I can re-evaluate for any explicit exfiltration or remote endpoints and raise the confidence of this assessment.Like a lobster shell, security has layers — review code before you run it.
latest
搜神猎手 (SouShen Hunter) - Bing 搜索 Skill
高性能 Bing 搜索引擎,基于 Playwright 实现深度网页信息提取。
核心功能
1. Bing 搜索
执行 Bing 搜索并返回结构化结果:
- 标题、URL、摘要、来源网站
- 自动过滤广告和无关内容
- 支持中文和英文搜索
2. 深度页面分析
对指定 URL 进行深度扫描,提取:
- 所有链接:文本、href、类型
- 表单信息:action、method、输入字段
- 按钮元素:文本、类型、动作
- 外部脚本:JS 文件 URL 列表
- 页面元数据:meta tags、Open Graph 等
使用方法
基础搜索
python scripts/bing_search.py "OpenClaw AI Agent"
深度页面分析
python scripts/bing_search.py "placeholder" --deep https://example.com
Python API
from bing_search import BingSearchAgent, SearchResult
async with BingSearchAgent(headless=True) as agent:
# 搜索
results = await agent.search("关键词", num_results=10)
# 深度分析
elements = await agent.extract_page_elements("https://example.com")
依赖要求
- Python 3.8+
- playwright (
pip install playwright) - Chrome/Chromium 浏览器
配置说明
脚本默认查找以下 Chrome 路径:
~/.local/bin/chrome-for-testing-dir/chrome/usr/bin/google-chrome/usr/bin/chromium
可通过修改脚本中的 CHROME_PATHS 列表自定义路径。
反检测特性
- 禁用自动化控制标记 (
--disable-blink-features=AutomationControlled) - 模拟真实用户代理
- 设置合理视口大小
- 随机化部分行为模式
Comments
Loading comments...
