Doc Snapshot Agent

v1.0.0

Automatically illustrate Markdown documents by turning image markers into screenshots or generated images, then writing an image-enriched Markdown output. Us...

⭐ 0· 21·0 current·0 all-time

bywangzhiming@wangzhiming1999

Security Scan

Capability signals

Requires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The skill's name and description match the included files: browser-automation references, site-explorer guidance, and a Python script that calls an image API. However, the registry metadata declares no required environment variables or binaries while SKILL.md and the bundled script explicitly expect: OPENROUTER_API_KEY, Playwright MCP browser tools (mcp__playwright__*), and PLAYWRIGHT_CRED_{SERVICE}_{FIELD} environment variables. The omission of these runtime requirements in the manifest is an incoherence — the declared requirements do not match what the skill actually needs to operate.

ℹ

Instruction Scope

The SKILL.md instructions are generally scoped to document illustration: parse markers, navigate pages via Playwright MCP, capture screenshots, generate images via OpenRouter, and write output under a project root. They also instruct the agent to read credentials from environment variables and to consult/update site knowledge files in external directories ($IMAGE_AGENT_SITE_KNOWLEDGE_DIR / $IMAGE_AGENT_SITE_LEARNING_DIR). These actions are within the claimed purpose but involve accessing environment variables, networked browser tooling, and local knowledge directories outside the skill package — any use of those should be explicitly authorized by the user. The instructions also recommend inspecting console/network activity for debugging, which can surface additional data about the target site.

✓

Install Mechanism

There is no install spec (instruction-only with a bundled Python helper). No remote downloads or obscure URLs are used. The only bundled executable is scripts/generate_image.py which requires the 'requests' Python package and an API key to contact openrouter.ai. This is a low-to-moderate install risk but requires the runtime to have Python and requests available.

Credentials

Functionally, asking for OPENROUTER_API_KEY (for image generation) and PLAYWRIGHT_CRED_{SERVICE}_{FIELD} (for site logins) is reasonable for this skill. However, the skill's registry metadata lists no required env vars, which is inconsistent and misleading. Also, prompt text and image-generation requests are sent to openrouter.ai (the script uses https://openrouter.ai/api/v1/chat/completions), meaning user prompts (the text describing images) will leave the machine and go to a third-party provider — users should be aware of privacy, policy, and billing implications before supplying API keys or sensitive prompts.

✓

Persistence & Privilege

The skill is not forced-always, does not request system-wide configuration changes, and does not modify other skills. It proposes writing site knowledge and output files under a user-chosen project root and .cache directories, which is normal for a disk-based workflow. There are no elevated persistence privileges requested in the manifest.

What to consider before installing

This skill appears to implement the advertised functionality (Playwright MCP-based screenshots and OpenRouter image generation) but the package metadata does not list the runtime credentials and tools it actually relies on. Before installing or running it: - Confirm you trust the OpenRouter provider and understand that prompts (image descriptions) will be sent to https://openrouter.ai; check privacy, retention, and billing policies for any API key you provide. - Do not paste secrets into chat; provide Playwright credentials (PLAYWRIGHT_CRED_{SERVICE}_...) only via secure environment variables or a secrets store, and only if you trust the runtime. - Ensure the runtime has Playwright MCP configured (mcp__playwright__ tools), Python and the 'requests' package for the bundled script, and that you are comfortable with the skill opening websites and inspecting network/console output. - Ask the skill author (or maintainer) to update the registry metadata to explicitly declare required env vars (OPENROUTER_API_KEY, any PLAYWRIGHT_CRED_* patterns), and to document any additional config directories (IMAGE_AGENT_SITE_KNOWLEDGE_DIR). This will resolve the current manifest/instruction mismatch. - If you cannot verify the above, run the skill in an isolated environment (or sandbox) and review generated network logs to confirm no unexpected endpoints are contacted. Given the manifest omissions, treat this package as suspicious (likely correct functionality but sloppy packaging/misdeclared requirements) rather than outright malicious. If you want higher confidence, request an updated package manifest or test in a controlled environment.

Like a lobster shell, security has layers — review code before you run it.

latestvk97dkxder1y2h94qjsr6pwatw1857gq2

21downloads

0stars

1versions

Updated 10h ago

v1.0.0

MIT-0

Doc Snapshot Agent

doc-snapshot-agent is a single entry-point skill for automatically adding images to Markdown documents.

It supports:

browser screenshots for product pages, dashboards, docs sites, and web apps
AI-generated images for conceptual illustrations
incremental reruns and partial regeneration
semantic placement of images into the correct paragraph or section
structured output directories for reusable assets and final Markdown

This package is intentionally published as one main skill plus supporting reference documents:

{baseDir}/references/browser-automation.md
{baseDir}/references/playwright-mcp.md
{baseDir}/references/site-explorer.md
{baseDir}/references/image-generation.md

Load this skill whenever the user asks to:

add images to a Markdown article
process a case file with image markers
capture screenshots for documentation
generate article visuals and insert them into a document
rerun or fix image placement in an already processed document

What This Skill Produces

Input:

a Markdown document containing image markers and optionally an Image Summary table

Output:

captured screenshots in a raw folder
final image assets ready for Markdown references
a generated README with image metadata
an illustrated Markdown file with image markers replaced by real image references

Project Root

All input, output, and cache paths are relative to a single project root directory ({project-root}).

At the very beginning of every run, ask the user which directory to use as the project root. If the user declines or says they have no preference, default to /tmp/doc-snapshot-agent.

Once confirmed, all subsequent paths in this skill (cases/, output/, .cache/, etc.) resolve under {project-root}/.

Recommended Directory Layout

{project-root}/
├── cases/
│   └── {article-id}.md
├── output/
│   ├── {article-id}/
│   │   ├── raw/
│   │   │   ├── A1_example.png
│   │   │   └── A2_example.png
│   │   ├── A1_example.png
│   │   ├── A2_example.png
│   │   └── README.md
│   └── markdowns/
│       └── {article-id}.md
└── .cache/
    └── screenshots/
        └── {article-id}/

Conventions:

{project-root}/cases/ stores the source Markdown file.
{project-root}/output/{article-id}/raw/ stores original browser screenshots and should never be overwritten by later processing.
{project-root}/output/{article-id}/ stores final images referenced by Markdown.
{project-root}/output/markdowns/ stores the final illustrated Markdown.
{project-root}/.cache/screenshots/ stores reusable screenshot cache entries.

If the user specifies a different layout, follow the user instruction instead.

Credentials

Some sites require authentication before the requested screenshot can be captured.

Read website credentials from environment variables using this pattern:

PLAYWRIGHT_CRED_{SERVICE}_{FIELD}

Examples:

PLAYWRIGHT_CRED_FELO_EMAIL
PLAYWRIGHT_CRED_FELO_PASSWORD

Rules:

read credentials from the environment instead of hardcoding them
never print secrets back to the user
if credentials are missing, tell the user which variable names are required
if the workflow reaches a login, signup, registration, invite, verification, or onboarding gate that needs user-specific information, stop and ask the user how to proceed
do not create new accounts, accept invitations, solve email verification, or invent profile information without explicit user input
after the user provides credentials or instructions, continue from the interrupted step instead of restarting the whole run unless the user asks for a fresh run

Supported Marker Formats

This skill must support both inline markers and summary tables.

Format A: Heading-Based Screenshot Marker

### 📷 Screenshot: {marker-id} ({filename})
Use: {why this screenshot exists}
Processing: {post-processing instruction}
Difference: {optional distinction from similar screenshots}

Fields:

marker-id: unique screenshot identifier such as A1, B3-1, or D3
filename: base filename without the marker prefix
Use: what the screenshot should communicate
Processing: crop, resize, or other post-processing needs
Difference: optional explanation for how this screenshot differs from similar ones

Format B: HTML Comment Image Marker

Screenshot:

<!-- IMAGE: screenshot (https://example.com/app)
Description: Workspace dashboard showing project activity and team sidebar
Filename: workspace-dashboard.png
-->

Generated image:

<!-- IMAGE: generated
Description: Editorial illustration of a collaborative AI workflow with folders and browser windows
Filename: ai-workflow-hero.png
-->

Image Summary Table

A document may end with a summary table listing all required images:

## Image Summary

| # | Type | Description | Filename |
|---|------|-------------|----------|
| 1 | generated | Description... | `hero.png` |
| 2 | screenshot | Description... | `dashboard.png` |

Important:

the summary table is the complete inventory of requested images
some images may also appear as inline markers in the body
some images may exist only in the summary table and must be placed intelligently during output generation

Incremental Execution and Resume Behavior

Do not assume the workflow always starts from zero. Before doing any work, inspect the article state and continue from the right step.

Check Existing Artifacts

For a given article id, inspect:

{project-root}/output/{article-id}/raw/*.png
{project-root}/output/{article-id}/*.png
{project-root}/output/{article-id}/README.md
{project-root}/output/markdowns/{article-id}.md
{project-root}/.cache/screenshots/{article-id}/

Decision Rules

New article: nothing exists -> run the full workflow.
Screenshots exist but Markdown does not: skip screenshot capture and rebuild only the Markdown and README.
Markdown exists and the user asks for fixes: reparse the source document and rebuild image placement without recapturing images.
Some screenshots are missing: capture only the missing ones, then continue.
The user asks to recapture specific images: regenerate only those images, then rebuild the Markdown.
The user asks to start over: ignore caches and rebuild everything from scratch.

Core Principles

default to incremental work
reuse screenshots whenever possible
treat Markdown regeneration as cheap and browser work as expensive
tell the user what will be skipped and what will be rerun

Workflow

Step 0: Verify Playwright MCP Server (MANDATORY)

This check MUST run at the start of EVERY execution, not just the first time.

Before any other work, verify that the Playwright MCP server is properly configured and running:

Check for Playwright MCP tools availability
- Attempt to list or detect tools with the mcp__playwright__ prefix
- Required tools include: mcp__playwright__browser_navigate, mcp__playwright__browser_snapshot, mcp__playwright__browser_screenshot

If tools are NOT detected, STOP immediately and guide the user to install:

Detect the current client environment and show the matching installation command:

Claude Code

claude mcp add playwright -- npx @playwright/mcp@latest

Codex

codex mcp add playwright -- npx @playwright/mcp@latest

VS Code / Cursor / Kiro (IDE with MCP settings UI)

Add to the MCP settings JSON (e.g. .vscode/mcp.json, .cursor/mcp.json, .kiro/settings/mcp.json):

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

Standalone MCP Server (headless environments or worker processes)

npx @playwright/mcp@latest --port 8931

Then point the client config to:

{
  "mcpServers": {
    "playwright": {
      "url": "http://localhost:8931/mcp"
    }
  }
}

Grant Tool Permissions (Claude Code / Codex)

{
  "permissions": {
    "allow": ["mcp__playwright__*"]
  }
}

Ask the user to configure and restart the session
Do NOT proceed to Step 1 until this check passes

Step 0.5: Confirm the Project Root

After verifying Playwright MCP, ask the user:

Which directory should I use as the project root for this run?

If the user provides a path, use it as {project-root}.
If the user says "no preference", skips the question, or does not answer, use /tmp/doc-snapshot-agent.

Create the directory if it does not exist. All subsequent paths (cases/, output/, .cache/, scripts/, references/) resolve under {project-root}/.

Step 1: Parse the Document and Build the Image Inventory

Read the source Markdown and merge image requirements from three sources:

inline heading-based screenshot markers
inline  markers
the Image Summary table

For each image, record:

type: screenshot or generated
filename
marker id if present
description or purpose text
source URL if present
post-processing instruction if present
exact location in the Markdown when there is an inline marker
whether the image still needs semantic placement

Also detect the target website or websites mentioned by the article.

Step 2: Prepare the Environment

ensure output directories exist
check screenshot cache for reusable images
load credentials from environment variables
confirm Playwright MCP tools are available — this skill REQUIRES Playwright MCP for all browser interactions
if Playwright MCP tools are not detected, stop and ask the user to configure the MCP server (see First-Time Setup Guide)
review {project-root}/references/playwright-mcp.md before interacting with the site
if the Chromium browser runtime is not installed, run npx playwright install chromium before continuing
if the target flow requires login or registration and the required credentials or account details are not already available, pause and ask the user before taking any account-specific action

CRITICAL: Browser Tool Requirement

This skill uses only Playwright MCP tools for browser automation. Do NOT use:

direct Playwright library calls
generic browser navigation tools that are not part of the Playwright MCP server
any tool that does not have the mcp__playwright__* prefix

All browser interactions must go through the Playwright MCP server tools:

mcp__playwright__browser_navigate
mcp__playwright__browser_snapshot
mcp__playwright__browser_screenshot
mcp__playwright__browser_click
mcp__playwright__browser_fill_form
etc.

If these tools are not available in the current runtime, the workflow cannot proceed. Ask the user to configure the Playwright MCP server first.

Step 2.5: Understand the Target Website Before Taking Screenshots

Bad screenshots usually come from navigating to the wrong page, not from using the wrong screenshot command.

Before capturing anything:

Check whether site knowledge already exists under:
- $IMAGE_AGENT_SITE_KNOWLEDGE_DIR/
- $IMAGE_AGENT_SITE_LEARNING_DIR/
Derive a stable site-key from the domain name:
- memclaw.me -> memclaw
- app.felo.ai -> felo
If {site-key}.md exists and is recent, read it before browsing.
If site knowledge is missing or stale, perform a structured site exploration and save the findings into the site knowledge files. See {project-root}/references/site-explorer.md.
Map every screenshot description to a specific page or state.

Common mapping mistakes:

taking a marketing homepage when the document actually asks for an authenticated workspace
taking a broad landing page when the description clearly asks for a specific panel or feature
ignoring keywords like dashboard, session history, team members, or invite

Write a screenshot navigation plan for each image:

target URL or click path
key elements that must be visible
whether scrolling, expanding, or tab switching is required

If new knowledge is discovered while browsing, append it to the site knowledge files so future runs do not repeat the same mistakes.

Step 3: Capture Browser Screenshots

Use the browser automation reference in {project-root}/references/browser-automation.md.

If Playwright MCP is available, also use {project-root}/references/playwright-mcp.md as the concrete execution guide for:

opening pages
reading the accessibility snapshot before acting
filling login forms
waiting for UI state changes
taking viewport, element, or full-page screenshots
checking console and network output when a page behaves unexpectedly

Typical flow:

open the target website
log in if required
navigate to the correct page or state for each screenshot
wait for key content to load
resize the viewport if needed
save screenshots to {project-root}/output/{article-id}/raw/

Naming rule:

if a marker id exists, save as {marker-id}_{filename}
otherwise use the original filename

Example:

A1_workspace-dashboard.png

After taking each screenshot, verify that the captured image actually matches the description. Do not rely only on DOM text. Visual layout, modals, loading states, overlays, and empty panels must be checked against the real screenshot file.

Step 4: Post-Process Screenshots

Apply the requested processing instructions if present.

Typical operations:

crop
resize
aspect-ratio adjustment
copy from raw/ into the final output directory

Principle:

raw/ keeps untouched originals
final images in {project-root}/output/{article-id}/ are the assets referenced by Markdown

Step 5: Generate the Illustrated Markdown

This step has two jobs:

replace inline markers exactly where they appear
place unanchored images from the summary table into the most relevant paragraph

1. Replace Inline Markers In Place

Heading marker example:

### 📷 Screenshot: A1 (workspace-dashboard.png)
Use: Show the authenticated workspace homepage
Processing: Full-width screenshot

becomes:

![Authenticated workspace homepage](../{article-id}/A1_workspace-dashboard.png)

HTML comment marker example:

<!-- IMAGE: screenshot (https://example.com/app)
Description: Workspace dashboard showing Architecture Decisions
Filename: architecture-decisions.png
-->

becomes:

![Workspace dashboard showing Architecture Decisions](../{article-id}/architecture-decisions.png)

2. Semantically Place Images Without Inline Markers

For images that appear only in the Image Summary table:

read the image description carefully
extract its important keywords and concepts
search the document body paragraph by paragraph
find the paragraph that discusses the same concept most directly
insert the image immediately after that paragraph, not just at the end of a broad section

Common mistakes:

appending all leftover images to the end of the article
placing an image at the end of a high-level section instead of after the exact paragraph that discusses the feature
using only section headings instead of reading paragraph content

Example:

if the description says Share panel showing team members and invite controls, prefer the paragraph that mentions inviting teammates rather than the end of a general onboarding section

3. Handle Generated Images

For generated images, use the image-generation reference in {project-root}/references/image-generation.md and the bundled script in {project-root}/scripts/generate_image.py.

If generation succeeds, insert the normal Markdown image reference. If generation fails, insert a warning block:

> Warning: AI image generation failed for {filename}

4. Remove the Image Summary Table

The Image Summary block is workflow metadata and should not remain in the final illustrated Markdown.

Step 6: Write the README Inventory

Create {project-root}/output/{article-id}/README.md with metadata such as:

article id or title
completion timestamp
image list
mapping from marker ids to filenames
dimensions
post-processing notes
unfinished or failed items

Suggested format:

# {article-id} Illustration Output

Article: {title}
Completed: {timestamp}

## Image Inventory

| Filename | Marker | Description | Size | Processing |
|----------|--------|-------------|------|------------|
| A1_example.png | A1 | Workspace dashboard | 1200x800 | resized |

## Notes

- Credentials source: environment variables
- Additional comments

## Remaining Work

- [ ] Any missing screenshot or failed generated image

Cache Policy

Use a simple file-based screenshot cache:

cache directory: {project-root}/.cache/screenshots/{article-id}/
cache key: screenshot filename
if a matching cache file exists and the user did not ask for a refresh, reuse it
if the user explicitly asks to recapture or refresh, ignore cache entries

Special Cases

Generated Images

When an image type is generated, do not mark it as missing by default. Generate it.

Prerequisites:

OPENROUTER_API_KEY is available
Python requests is installed

Default command:

python {project-root}/scripts/generate_image.py "{description}" -o "{project-root}/output/{article-id}/{filename}"

Use a stronger model for text-heavy images:

python {project-root}/scripts/generate_image.py "{description}" -o "{project-root}/output/{article-id}/{filename}" -m google/gemini-3-pro-image-preview

Generation prompt guidance:

include the subject clearly
include visual style if the document suggests one
mention whether the image is for a technical article, tutorial, or product explainer
mention visible text explicitly if the image needs readable labels

Failure handling:

add a warning block into the output Markdown
record the failure in the README remaining-work section
continue the rest of the workflow

Multilingual Documents

If the document is language-specific, make sure the captured website matches that language. If the site supports language switching, switch before taking screenshots.

Dynamic Pages

Before taking screenshots:

wait for key content to load
close overlays or popups
wait for animations to settle
confirm the page is in the correct state

Output Requirements

When this skill finishes, return a concise summary containing:

article id processed
what work was reused versus newly generated
output Markdown path
image output directory
any failed or missing images

Quick Reference

Project root (ask user, default /tmp/doc-snapshot-agent):
  {project-root}/

Input:
  {project-root}/cases/{article-id}.md

Output:
  {project-root}/output/{article-id}/raw/*.png
  {project-root}/output/{article-id}/*.png
  {project-root}/output/{article-id}/README.md
  {project-root}/output/markdowns/{article-id}.md

Credentials:
  PLAYWRIGHT_CRED_{SERVICE}_{FIELD}

Cache:
  {project-root}/.cache/screenshots/{article-id}/

References:
  {project-root}/references/browser-automation.md
  {project-root}/references/playwright-mcp.md
  {project-root}/references/site-explorer.md
  {project-root}/references/image-generation.md

Bundled script:
  {project-root}/scripts/generate_image.py

Comments

Loading comments...