image-collector

v1.0.0

AI 科技日报图片采集工具,从官方来源自动采集新闻配图,支持水印检测、质量检查和关联性验证

0· 93·0 current·0 all-time
byluna@lunadelo

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lunadelo/image-collector.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "image-collector" (lunadelo/image-collector) from ClawHub.
Skill page: https://clawhub.ai/lunadelo/image-collector
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install image-collector

ClawHub CLI

Package manager switcher

npx clawhub@latest install image-collector
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (image collection, watermark/quality/relevance checks) align with the provided scripts and README. The code implements search/download, watermark/quality checks, and local optimization; required Python packages (Pillow, requests) are consistent with the task.
Instruction Scope
SKILL.md instructs running the included scripts and (optionally) calling a local web-access service for screenshots. The scripts perform HTTP requests to external sites, download image bytes, and write them to the workspace /home/node/.openclaw/workspace/article-images. This behavior is expected for an image collection tool but means network I/O and filesystem writes will occur; the skill does not attempt to read unrelated system files or environment secrets.
Install Mechanism
No install spec; code is shipped as plain Python scripts. No remote installers or archive downloads are used. Dependencies are standard Python packages (Pillow, requests) and a simple shell dependency check script.
Credentials
The skill requests no environment variables or credentials. The scripts do set and use a fixed workspace path for output but do not require secret access tokens. External network requests are limited to image/search endpoints and hardcoded example URLs.
Persistence & Privilege
Skill is not always-enabled and does not request elevated platform privileges. It writes files into its own workspace directory and does not modify other skills or global agent configuration.
Assessment
This skill appears coherent for automated image collection: it will make outbound HTTP requests and save downloaded images under /home/node/.openclaw/workspace/article-images. Before running it, review the full collect_images.py (the provided content was truncated in the listing) to confirm there are no unexpected network endpoints or commands. If you care about privacy or network egress, run it in a sandboxed environment or inspect/limit allowed domains. Also review the hardcoded lists of example image URLs and the use of the jina.ai proxy search URL (r.jina.ai) if you do not want search queries routed through third-party proxies.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📸 Clawdis
latestvk977v7ar1r7nxrc2dh0b5sby3h8419cp
93downloads
0stars
1versions
Updated 3w ago
v1.0.0
MIT-0

image-collector Skill

功能概述

自动为 AI 科技日报新闻采集配图,确保:

  • 图片与内容强关联 — 优先从官方来源采集
  • 无水印 — 自动检测并过滤带水印图片
  • 高质量 — 分辨率≥800x600,无明显压缩
  • 不随意贴图 — 关联性验证,拒绝随机图片

快速开始

# 检查依赖
bash ~/.openclaw/workspace/skills/image-collector/scripts/check-deps.sh

基本用法

# 为单条新闻采集配图
python3 ~/.openclaw/workspace/skills/image-collector/scripts/collect_images.py \
  --news "苹果国行 AI 凌晨偷跑" \
  --keywords "Apple,Intelligence,Baidu" \
  --source "apple.com"

图片来源优先级

优先级来源类型示例域名
P0官方新闻稿apple.com, microsoft.com, openai.com
P1权威媒体36kr.com, bloomberg.com, reuters.com
P2产品截图手动截取(使用 web-access skill)
P3自制图表Python matplotlib / Excel
禁用随机图片unsplash.com, pixabay.com
禁用微信图片mmbiz.qpic.cn

验证流程

  1. 来源验证 → 只从白名单来源采集
  2. 水印检测 → 四角 + 底部检测,过滤带水印图片
  3. 质量检查 → 分辨率≥800x600,宽高比正常
  4. 关联性验证 → 文件名关键词匹配评分
  5. ✅ 最终输出 → 优化后的图片

配合 web-access skill

当自动采集失败时:

# 1. 用 web-access 打开官网截图
curl -s "http://localhost:3456/new?url=https://www.apple.com/newsroom"

# 2. 截图后手动保存到 article-images/

# 3. 用 image-collector 优化
python3 collect_images.py --optimize /tmp/screenshot.png

验证标准

1. 关联性验证

  • 图片主题与新闻标题匹配
  • 图片中包含新闻关键词
  • 图片来源与新闻主体相关

2. 水印检查

  • 四角无 Logo 水印
  • 底部无公众号名称
  • 无明显版权标识

3. 质量检查

  • 分辨率 ≥ 800x600
  • 无明显压缩痕迹
  • 色彩正常

作者

九万

Comments

Loading comments...