Install
openclaw skills install glmv-pdf-to-webConvert a PDF (research paper, technical report, or project document) into a beautiful single-page academic/project website with a structured outline JSON. T...
openclaw skills install glmv-pdf-to-webConvert a research paper or technical document PDF into a polished single-page project website — the kind used for NeurIPS/CVPR/ICLR paper releases. Pages are converted locally at DPI 120, a structured outline.json is saved, images are cropped locally, and the final page is saved with generate_web.py.
Scripts are in: {SKILL_DIR}/scripts/
Python packages (install once):
pip install pymupdf pillow
System tools: curl (pre-installed on macOS/Linux).
Trigger when the user asks to create a webpage or project page from a PDF — phrases like: "make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", or any similar intent in Chinese or English.
All output goes under {WORKSPACE}/web/<pdf_stem>_<timestamp>/:
web/
└── <pdf_stem>_<timestamp>/
├── outline.json ← structured web plan (WebPlan schema)
├── crops/ ← locally-saved cropped images
│ ├── fig_arch_crop.png
│ ├── table_results_crop.png
│ └── ...
└── index.html ← the website
<pdf_stem> = PDF filename without extension<timestamp> = format YYYYMMDD_HHMMSScrops/<name>_crop.png$ARGUMENTS is the path to the PDF file (local) or an HTTP/HTTPS URL.
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")
mkdir -p "<out_dir>/crops"
If the input is a URL, download it first:
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"
Then convert (pass either the downloaded path or the original local path):
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120
Outputs JSON to stdout:
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]
Parse and store the full page → path map.
View all page images sequentially before planning. Goal: pure understanding of the document's content, figures, and structure.
While reading, note:
Do NOT plan sections yet — read everything first.
Plan the website sections. Standard structure for academic papers (adapt as needed):
section_id | Purpose |
|---|---|
hero | Title, authors, venue badge, link buttons |
abstract | Full abstract text |
contributions | 3–5 key contribution cards |
method | Architecture figure + method explanation |
results | Quantitative table + qualitative figures |
conclusion | Brief conclusion |
citation | BibTeX block |
For each section that needs an image, identify:
Save as <out_dir>/outline.json using exactly this schema:
{
"project_title": "Paper Title",
"lang": "English",
"authors": ["Author One", "Author Two"],
"sections_plan": [
{
"section_index": 1,
"section_id": "hero",
"title": "Hero",
"content": "Title, authors, venue, teaser figure description",
"required_images": [
{
"url": "<local_page_path_from_phase1>",
"visual_description": "Figure 1: teaser showing input-output examples",
"usage_reason": "Hero section visual to immediately show the paper's output"
}
]
}
]
}
Field notes:
lang: "Chinese" or "English" — match the PDF languagerequired_images: empty array [] if section needs no imagesurl: the local file path of the source page (from Phase 1 path field)Write outline.json using the Write tool to <out_dir>/outline.json.
IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.
IMPORTANT: You MUST use the provided {SKILL_DIR}/scripts/crop.py script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.
Read outline.json. Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.
Use the Agent tool like this:
Agent tool call:
description: "Grounding crop page N"
prompt: |
You are a visual grounding and cropping assistant. Your task is to precisely
locate specified visual elements in a page image and crop them out.
## Grounding method
Use visual grounding to locate each target:
1. Read the source image using the Read tool to view it
2. Identify the target element described below
3. Determine its bounding box as normalized coordinates in the 0–999 range:
- 0 = left/top edge of the image
- 999 = right/bottom edge of the image
- These are thousandths, NOT pixels, NOT percentages (0–100)
- Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
- Example: [0, 0, 500, 500] = top-left quarter of the image
4. Be precise: tightly bound the target element with a small margin (~10–20 units)
around it. Do NOT crop too wide or too narrow.
## Source image
<page_image_path>
## Crops needed
For each crop below, first do grounding (locate the element), then crop:
1. Name: "<descriptive_name>"
Target: "<visual_description from outline.json>"
Context: "<usage_reason from outline.json>"
## Crop command
After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
```bash
python <SKILL_DIR>/scripts/crop.py \
--path "<page_image_path>" \
--box X1 Y1 X2 Y2 \
--name "<crop_name>" \
--out-dir "<out_dir>/crops"
```
## Verification
After each crop, READ the output image to visually verify the correct region
was captured. If the crop missed the target or is too wide/narrow, adjust the
coordinates and re-run crop.py.
## Output
Report the final results as a list:
- crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]
Replace <page_image_path>, <SKILL_DIR>, <out_dir>, and crop details with actual values from your context.
The crop.py script outputs JSON: {"path": "/abs/path/<name>_crop.png"}
Collect results from all subagents and build the mapping: section_id → [crop filename, ...] to reference in HTML.
Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
if f.endswith('.png'):
w, h = Image.open(os.path.join(d, f)).size
sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"
| Aspect ratio | Layout recommendation |
|---|---|
| < 0.7 (tall/narrow) | max-width: 400–500px, centered |
| 0.7 – 1.3 (square-ish) | max-width: 600–700px |
| > 1.3 (wide) | Full-width, max-width: 100% |
| > 2.0 (very wide, e.g. tables) | Full-width with horizontal scroll fallback |
Step A — Write HTML to /tmp/website.html
<img src="..."> must use relative paths: crops/<name>_crop.pngStep B — Save:
python {SKILL_DIR}/scripts/generate_web.py \
--html-file /tmp/website.html \
--title "<paper title>" \
--out-dir "<out_dir>/"
A single self-contained HTML file — embedded CSS, minimal vanilla JS only. No external JS frameworks. Google Fonts CDN is fine.
Page layout:
900px, centered, comfortable side paddingTypography:
Visual style:
Section guidelines:
hero:
[📄 Paper] [💻 Code] [🗄️ Dataset] — grey out if no URLabstract:
contributions:
method:
<figure><img><figcaption>) + prose explanationresults:
<table> — use actual numbers from the PDF, best numbers boldedconclusion:
citation:
<pre><code> BibTeX block reconstructed from PDF metadatanavigator.clipboard vanilla JSImages:
<img> use relative paths: crops/<name>_crop.pngloading="lazy" and descriptive alt<figure> with <figcaption>Animations (subtle only):
IntersectionObserver + CSS transitions<pdf_stem>_<timestamp>/outline.json saved with valid WebPlan schemacrops/ (local only)crops/<name>_crop.pnggenerate_web.py called and confirmed successMatch the PDF language. English paper → English website. Chinese paper → Chinese. No mixing.