Ai Research Scraper

v1.8.14

用于抓取AI领域最新研究信息的技能,重点关注AI产品发展。从知名AI网站获取信息,提供简洁概括和链接,限制数据量以便快速阅读。

1· 2.3k·7 current·7 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
SKILL.md says it scrapes AI sites and provides concise summaries. The code does implement scraping and summarization, but it also depends on an external 'tavily-search' skill (invoking /root/.openclaw/workspace/skills/tavily-search/scripts/search.mjs via subprocess) and references multiple translation API clients/tests. The skill's registry metadata declares no required env vars or binaries, yet the code implies node must be present and external API keys may be needed. Those dependencies are not declared and are disproportionate to the simple description.
!
Instruction Scope
SKILL.md runtime instruction is to run scripts/scraper.py. The scripts do call subprocess to run another skill's Node script and make network calls to translation/search endpoints. Some scripts (e.g., simple_scraper.py) import translate_text from scraper.py, but scraper.py does not define translate_text—indicating broken or inconsistent instructions. The instructions do not mention the tavily-search dependency, Node requirement, or required API keys.
Install Mechanism
There is no install spec (instruction-only), so nothing will be automatically downloaded or installed by the registry. However the package includes runnable Python scripts that assume a runtime (node) and presence of another skill's files; running them will execute local code and spawn external processes. Lack of an install step reduces supply-chain risk but does not eliminate execution risk when the scripts are run.
!
Credentials
The skill declares no required environment variables or credentials, yet references translation APIs and Tavily Search in docs and tests that expect API keys (e.g., TAVILY_API_KEY, Google/Microsoft/Baidu/Youdao credentials). Tests and api_reference.md show explicit API-key usage. This mismatch (code/doc expecting secrets but metadata declaring none) is a red flag: running tests or enabling translation/search fallbacks could prompt you to add sensitive credentials without the skill declaring why.
Persistence & Privilege
The skill does not request always:true and is user-invocable only. It does not modify other skills or system-wide settings in the provided code. However it does spawn subprocesses that execute another skill's script from the workspace, which effectively grants it the ability to run code located in other skill directories—this increases runtime blast radius even though the skill itself does not persistently install or escalate privileges.
What to consider before installing
This skill is internally inconsistent and should be treated with caution. Before installing or running it: - Inspect the tavily-search skill referenced at /root/.openclaw/workspace/skills/tavily-search/scripts/search.mjs (the scraper invokes that Node script). Ensure you trust that code and that Node is the expected runtime. - Do not provide API keys or other credentials until the skill explicitly documents and declares them in metadata. The repo mentions Tavily and various translation APIs but the metadata lists no required env vars. - Be aware some scripts reference missing functions (e.g., translate_text) or rely on files that may not exist—the package may be broken or incomplete. - Run the scripts in a sandbox (non-root container) first and review network calls (outbound endpoints) to confirm they contact expected services (official blogs, known translation endpoints, or tavily api), not unknown domains. - If you need this functionality, ask the maintainer for an explicit dependency list (node, required skills), a manifest of required environment variables, and a fix for the missing/unused functions before granting any credentials or running the skill in production.

Like a lobster shell, security has layers — review code before you run it.

aivk972tjyqxxa53ca3avs4gy8enh811rgklatestvk97bygspg1wqv0ghevgz9yqwcx810cahproduct-developmentvk972tjyqxxa53ca3avs4gy8enh811rgkresearchvk972tjyqxxa53ca3avs4gy8enh811rgkscrapervk972tjyqxxa53ca3avs4gy8enh811rgk
2.3kdownloads
1stars
23versions
Updated 1mo ago
v1.8.14
MIT-0

AI Research Scraper

概述

这个技能专门用于从知名AI领域网站抓取最新研究信息,重点关注AI产品发展方面。它会限制抓取的数据量和token使用,提供简洁的信息概括,并包含原始网页链接,方便用户进一步查阅详细内容。

使用场景

  • 当您需要快速了解AI领域最新产品动态时
  • 当您想跟踪特定AI产品的发展信息时
  • 当您需要获取AI技术发展的最新研究成果摘要时

支持的网站

当前支持的AI领域知名网站包括:

  • TechCrunch AI板块
  • VentureBeat AI板块
  • MIT Technology Review AI板块
  • Google AI Blog
  • Microsoft AI Blog
  • NVIDIA Blog

功能特点

  • 数据量控制:限制摘要长度,确保信息简洁
  • 重点突出:聚焦AI产品发展信息
  • 链接完整:每个摘要都包含原始网页链接
  • 定时更新:支持定期抓取最新信息
  • 可配置性:支持添加或修改目标网站
  • 网络优化:增加超时时间和重试机制,提高抓取成功率
  • 缓存机制:添加1小时缓存,避免重复抓取相同内容
  • 稳定源:优化网站列表,使用更稳定的AI领域网站源
  • 备用搜索:支持使用tavily-search技能替代,避免网络超时问题
  • 简化处理:暂时移除翻译功能,避免API错误和网络超时问题

使用方法

快速开始

使用默认配置抓取AI产品发展相关信息:

python3 /root/.openclaw/workspace/skills/ai-research-scraper/scripts/scraper.py

配置选项

自定义网站列表

编辑 references/websites.txt 文件,添加或删除目标网站。每行格式:

网站名称|网站URL|RSS/Feed URL(可选)

控制摘要长度

python3 /root/.openclaw/workspace/skills/ai-research-scraper/scripts/scraper.py --max-tokens 500

指定时间范围

python3 /root/.openclaw/workspace/skills/ai-research-scraper/scripts/scraper.py --days 7

指定主题重点

python3 /root/.openclaw/workspace/skills/ai-research-scraper/scripts/scraper.py --topic product-development

脚本和资源

Scripts

  • scripts/scraper.py: 主要的网页抓取和信息提取脚本
  • scripts/example.py: 示例脚本(可删除或修改)

References

  • references/websites.txt: 包含要抓取的网站列表
  • references/api_reference.md: API参考文档(待完善)

Assets

  • assets/: 资源文件夹(可用于存放模板或其他静态资源)

Comments

Loading comments...