Save Article With Images

v1.0.1

Save web articles locally with images. Automatically downloads images, generates Markdown, and converts to PDF. Supports WeChat Official Account articles via...

⭐ 0· 112·0 current·0 all-time

byBenjiamin Jason@barryqin9999

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for barryqin9999/save-article-with-images.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Save Article With Images" (barryqin9999/save-article-with-images) from ClawHub.
Skill page: https://clawhub.ai/barryqin9999/save-article-with-images
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install save-article-with-images

ClawHub CLI

Package manager switcher

npx clawhub@latest install save-article-with-images

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

Name and description match the included Python script and SKILL.md (scrape article, download images, produce Markdown/PDF). However SKILL.md instructs use of tools (pandoc, weasyprint, browser actions, Feishu messaging, Jina Reader) and platform integrations that are not declared in requirements; the included script only implements a WeChat-specific scraper and does not implement Feishu upload. This mismatch (claimed integrations vs actual footprint) is unexpected and should be clarified.

Instruction Scope

Instructions include sending page content to third-party Jina Reader (curl https://r.jina.ai/URL) and browser eval actions that capture whole page text and images. The SKILL.md also instructs sending output to Feishu but no credentials or secure handoff are defined. The Python script fetches a hardcoded WeChat article URL and performs network downloads and file writes; it will send requests to mp.weixin.qq.com and external image hosts. These external network calls and the potential for sending full article content to third-party services are notable data-flow risks and not explicitly declared to the user.

ℹ

Install Mechanism

No install spec (instruction-only) — lower install risk. But runtime requirements are implied (pandoc, weasyprint, Python with requests and BeautifulSoup). Those dependencies are not declared; running the included script will require Python packages and will attempt file I/O. Lack of an install spec is acceptable but the runtime dependency list should be documented.

Credentials

The skill declares no required environment variables or credentials, yet SKILL.md expects Feishu messaging and possibly browser automation (which generally require tokens or a configured connector). The code writes files under /home/admin/.openclaw/workspace (hardcoded) rather than the SKILL.md's '~/.openclaw/workspace', which can surprise users. Requesting zero credentials while instructing use of external services is disproportionate and unclear.

✓

Persistence & Privilege

Skill is not always-enabled and does not request elevated platform privileges. It writes files to the local filesystem (user workspace) which is expected behavior for a saver/clipper. It does not modify other skills or system-wide settings.

What to consider before installing

This skill appears to implement article scraping and image download, but several issues require attention before use: - External data flows: The SKILL.md recommends using Jina Reader (r.jina.ai) which sends the target URL/content to a third party. If article content is sensitive, do not use that option. - Undeclared runtime dependencies: The instructions and code expect pandoc, weasyprint, Python packages (requests, bs4). The package declares no install steps — prepare these dependencies yourself or sandbox execution. - Missing/undeclared credentials: SKILL.md describes sending results to Feishu but the skill declares no Feishu credentials or integration steps; sending will not work without additional configuration and may leak data if misconfigured. - Hardcoded paths and URL: scripts/save_wechat.py uses a hardcoded WeChat article URL and writes to /home/admin/.openclaw/workspace/..., which may not match your environment and could create files in unexpected locations. Treat the script as an example and inspect/modify paths/URL before running. Recommended actions: review the Python script line-by-line, run it in a restricted/sandboxed environment, remove or change the hardcoded URL/path, document and supply any required credentials securely, and avoid the Jina Reader option if you cannot send article text to a third party. If you need certainty about what this skill will do in your environment, ask the author for a dependency list, a non-hardcoded configuration, and clear instructions for Feishu integration.

Like a lobster shell, security has layers — review code before you run it.

latestvk97bqhx9ayemagv2xeenmr923983kydt

112downloads

0stars

3versions

Updated 1mo ago

v1.0.1

MIT-0

Save Article with Images

Save web articles to local storage, supporting articles with images. Automatically downloads images, generates Markdown, and converts to PDF.

Triggers

"save article"
"save this article"
"download article"
"clip article"

Quick Execution

Articles Without Images

1. Fetch article content (Jina Reader or browser)
2. Save to saved-articles/{title}-{date}.md
3. Send file to Feishu

Articles With Images

1. Create directory reports/{article-name}/
2. Create images/ subdirectory
3. Download all images to images/
4. Generate Markdown (relative path references)
5. Convert to PDF
6. Send PDF to Feishu

Complete Workflow

Step 1: Check if Article Has Images

Methods:

Jina Reader returns content with ![Image](URL) format
Or original webpage has <img> tags

Decision:

Images < 3 → Save Markdown directly, don't download images separately
Images ≥ 3 → Process with image workflow

Step 2: Create Directory Structure

mkdir -p ~/.openclaw/workspace/reports/{article-name}/images/

Directory Structure:

reports/{article-name}/
├── {article-name}.md      # Markdown file
├── {article-name}.html    # HTML intermediate (optional)
├── {article-name}.pdf     # Final output (optional)
└── images/                # Image directory
    ├── image1.jpg
    ├── image2.png
    └── ...

Step 3: Fetch Article Content

Method A: Jina Reader (Recommended)

curl -s "https://r.jina.ai/URL"

Pros: Auto-converts to Markdown, extracts image links Cons: Some sites blocked

Method B: Browser Fetch

# Open webpage
browser action=open url=URL

# Get content
browser action=act kind=evaluate fn='() => document.body.innerText'

# Get images
browser action=act kind=evaluate fn='() => {
  const imgs = document.querySelectorAll("img");
  return JSON.stringify(Array.from(imgs).map(img => ({
    src: img.src,
    alt: img.alt
  })));
}'

Step 4: Download Images

Single Image:

curl -o "images/image1.jpg" "https://example.com/image.jpg"

Batch Download (Python):

import requests
from pathlib import Path

def download_images(image_urls, output_dir):
    """Download image list"""
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    for i, url in enumerate(image_urls, 1):
        try:
            # Get extension
            ext = url.split('.')[-1].split('?')[0]
            if ext not in ['jpg', 'jpeg', 'png', 'gif', 'webp']:
                ext = 'jpg'
            
            # Download
            resp = requests.get(url, timeout=30, headers={
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
            })
            
            if resp.status_code == 200:
                filename = f"image{i}.{ext}"
                (output_dir / filename).write_bytes(resp.content)
                print(f"✅ {filename}")
            else:
                print(f"❌ HTTP {resp.status_code}: {url}")
        except Exception as e:
            print(f"❌ {e}: {url}")

# Usage
# download_images(['url1', 'url2'], 'images/')

Image Naming:

Sequential: image1.jpg, image2.png, ...
By content: cover.jpg, screenshot.png, ...

Step 5: Generate Markdown

Template:

# {Article Title}

> Source: {URL}
> Author: {author}
> Published: {date}

---

![Cover](images/image1.jpg)

{Content}

---

## Images

![Figure 1: {description}](images/image2.jpg)
![Figure 2: {description}](images/image3.png)

---

*Saved: {timestamp}*

Image Reference Format:

![Description](images/filename.ext)

Step 6: Convert to PDF (Optional)

Using Preset Styles:

# CSS file
CSS_FILE=~/.openclaw/workspace/templates/mobile-friendly.css

# Convert to HTML
pandoc {article-name}.md -o {article-name}.html --standalone --css=$CSS_FILE

# Generate PDF
weasyprint {article-name}.html {article-name}.pdf

PDF Configuration:

Body: 16pt, line-height 1.8
Page: 6×9 inches, margins 1.5cm
Font: Noto Sans CJK SC

⚠️ Image Overflow Solution (Important)

Problem: Images too large (e.g., 1200px wide), exceed PDF page width (~432pt/6 inches)

Solution: Create CSS file to limit image max-width

Required CSS:

/* Prevent image overflow */
img {
  max-width: 100%;
  height: auto;
  display: block;
  margin: 1em auto;
}

/* Images in images/ directory - 90% width */
img[src^="images/"] {
  max-width: 90%;
  margin: 0.5em auto;
}

/* Body styles */
body {
  max-width: 100%;
  padding: 1cm;
}

Correct PDF Generation Flow:

# 1. Create CSS file (in article directory)
cat > style.css << 'EOF'
img { max-width: 100%; height: auto; }
img[src^="images/"] { max-width: 90%; }
EOF

# 2. Generate HTML with CSS
pandoc {article-name}.md -o {article-name}.html --standalone --css=style.css

# 3. Generate PDF
weasyprint {article-name}.html {article-name}.pdf

Key Points:

✅ Must add max-width: 100% or max-width: 90%
✅ Use relative paths images/xxx.jpg
❌ Don't render images at original size (will overflow)

Step 7: Send to Feishu

Send Markdown:

message action=send channel=feishu target="user:ou_xxx" filePath="path/to/file.md"

Send PDF:

message action=send channel=feishu target="user:ou_xxx" filePath="path/to/file.pdf"

Platform-Specific Handling

Source	Fetch Method	Image Handling
Twitter/X	Jina Reader	Download pbs.twimg.com images
WeChat Official Account	browser + Camoufox	Download mmbiz.qpic.cn images
General Webpages	Jina Reader	Download all img tags
Login Required Sites	browser	User manual screenshot

Twitter/X Articles

Image URL Format:

https://pbs.twimg.com/media/XXXXX?format=jpg&name=small

Download Command:

# Get best quality
curl -o "images/image1.jpg" "https://pbs.twimg.com/media/XXXXX?format=jpg&name=large"

WeChat Official Account Articles

Problem: WeChat has anti-hotlinking, direct download fails

Solutions:

Use browser to open article
Save screenshot
Or use Camoufox tool

# Use tool from agent-reach
cd ~/.agent-reach/tools/wechat-article-for-ai
python3 main.py "https://mp.weixin.qq.com/s/ARTICLE_ID"

Checklist

After saving, verify:

□ Markdown file generated
□ All images downloaded successfully
□ Image relative paths correct
□ Images display correctly (local preview)
□ PDF generated successfully (optional)
□ File sent to Feishu

Error Handling

Error	Cause	Solution
Image download failed	Anti-hotlinking/Network	Use browser or lower quality
PDF generation failed	Missing fonts/dependencies	Check weasyprint installation
Markdown images not showing	Path error	Check relative paths
Jina Reader blocked	Site restriction	Use browser fetch

File Locations

Type	Directory
Simple articles	`saved-articles/{title}-{date}.md`
Articles with images	`reports/{article-name}/`
Temporary files	`/tmp/article-{id}/`

Skill Version: 1.0.0 Created: 2026-03-17

Comments

Loading comments...