webpage-reader-skill
v0.0.2使用Google Chrome无头浏览器下载和读取网页内容,生成摘要并安全处理临时文件以保护隐私。
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description match the delivered code and instructions: the skill uses Chrome headless to fetch HTML, summarize it, and uses temporary files. No unrelated credentials, binaries, or services are requested.
Instruction Scope
SKILL.md and the code instruct the agent to check for Chrome and attempt installation if missing; the runtime does exactly that. Minor mismatches/bugs exist (e.g., macOS chrome detection uses 'which chrome' and Linux distro detection relies on platform.dist(), which is removed in modern Python), so automatic installation may fail on some systems. The instructions ask the agent to run package-manager commands (apt/dnf/brew) which is within scope but requires elevated privileges when invoked.
Install Mechanism
There is no install spec in the registry (instruction-only), which is low-risk. At runtime the code may call system package managers (apt-get, dnf, brew) via subprocess to install Chrome; these calls are local and standard but require sudo/privilege and could prompt the user. No downloads from untrusted URLs or archive extraction are present in the repository.
Credentials
The skill requests no environment variables or external credentials. It inspects a few OS environment paths (PROGRAMFILES, LOCALAPPDATA) only to detect Chrome on Windows, which is reasonable for its purpose.
Persistence & Privilege
always is false and the skill does not modify other skills or system-wide agent settings. It uses temporary directories for files and cleans up via tempfile.TemporaryDirectory.
Assessment
This skill appears coherent with its description, but check these points before installing:
- The skill will execute Google Chrome headless to fetch arbitrary URLs you supply — that means the host will make outbound network requests to those sites.
- If Chrome is missing the skill may run package-manager commands (apt-get/dnf/brew) which can require sudo and may prompt the user; consider installing Chrome yourself beforehand to avoid unintended privilege elevation attempts.
- There are minor implementation bugs (macOS detection and Linux distro detection) that may prevent automatic installation — no evidence of malicious behavior, just brittle code.
- Logs are written to console; the downloaded HTML is read into memory and returned by the skill. If you plan to process sensitive URLs/content, review the code and consider running the skill in a sandboxed environment.
If you want higher assurance, run a quick code review or test in an isolated environment (VM/container) and ensure Chrome is pre-installed so the skill won't attempt package installs.Like a lobster shell, security has layers — review code before you run it.
latest
OpenClaw 网页下载器技能
技能描述
网页读取器技能是一个强大的工具,允许您使用Google Chrome的无头浏览器读取和分析网页内容。此技能可以:
- 检查系统是否安装了Google Chrome
- 如果未找到Chrome,自动尝试安装(在支持的平台上)
- 使用Chrome的无头模式和优化参数下载网页内容
- 读取和处理下载的HTML内容
- 生成网页内容摘要
- 安全处理临时文件以保护您的隐私
安装指南
先决条件
- Python 3.8或更高版本
- Google Chrome浏览器(将被自动检测,如果缺少将提供安装协助)
安装步骤
- 在OpenClaw中安装技能:
- 打开OpenClaw
- 进入技能管理器
- 点击"添加技能"
- 选择您下载此技能的目录
- 点击"安装"
平台特定说明
- Windows:Chrome安装需要从Google Chrome手动下载
- macOS:自动安装需要Homebrew。如果未安装Homebrew,需要手动安装。
- Linux:支持在Ubuntu/Debian和Fedora/CentOS/RHEL发行版上自动安装。对于其他发行版,需要手动安装。
使用示例
基本用法
from webpage_reader import main
result = main("https://example.com")
if result['success']:
print("网页下载成功!")
print("摘要:")
print(result['summary'])
print("\n内容预览:")
print(result['content'][:500] + "..." if len(result['content']) > 500 else result['content'])
else:
print(f"错误:{result['message']}")
命令行用法
python webpage_reader.py https://example.com
OpenClaw界面用法
- 打开OpenClaw
- 选择网页读取器技能
- 在输入字段中输入URL
- 点击"运行"
- 在输出面板中查看结果
技术详情
Chrome命令参数
技能使用以下Chrome命令参数以获得最佳性能:
google-chrome --headless=new --no-sandbox --disable-gpu --disable-dev-shm-usage --virtual-time-budget=8000 --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36" --hide-scrollbars --blink-settings=imagesEnabled=true --dump-dom <url>
输出格式
技能返回具有以下结构的字典:
{
"success": bool, # 操作是否成功
"message": str, # 状态消息
"content": str, # 网页的完整HTML内容
"summary": str # 网页内容摘要
}
故障排除
常见问题
-
未找到Chrome
- 解决方案:从https://www.google.com/chrome/手动安装Google Chrome
-
权限错误
- 解决方案:以适当的权限运行技能,尤其是在Linux上安装Chrome时
-
超时错误
- 解决方案:技能有60秒的超时。对于大型网页,这可能不够。您可以在
download_webpage函数中修改超时时间。
- 解决方案:技能有60秒的超时。对于大型网页,这可能不够。您可以在
-
内容为空
- 解决方案:检查URL是否可访问,且未被CAPTCHA或其他反爬措施阻止
-
编码错误
- 解决方案:技能使用UTF-8编码。对于使用不同编码的网页,您可能需要修改
read_webpage_content函数中的编码处理。
- 解决方案:技能使用UTF-8编码。对于使用不同编码的网页,您可能需要修改
日志记录
技能生成详细的日志以帮助诊断问题。日志默认输出到控制台,但可以配置为写入文件(如果需要)。
贡献
欢迎贡献!请随时提交Pull Request。
许可证
此技能以MIT许可证发布。有关详细信息,请参阅LICENSE文件。
支持
如果您遇到任何问题或有疑问,请在GitHub存储库上打开一个issue。
Comments
Loading comments...
