{"skill":{"slug":"url2md","displayName":"URL to Markdown","summary":"The skill url2md converts HTML web pages from HTTP/HTTPS URLs to clean, readable Markdown files with optional batch processing and formatting features.","description":"---\nname: url2md\ndescription: Convert web pages to Markdown format. Use when the user needs to: (1) Extract readable content from a URL and convert it to Markdown, (2) Batch convert multiple URLs to Markdown files, (3) Save web page content as .md for documentation, archiving, or note-taking. Works with any HTTP/HTTPS URL that returns HTML content. Also use when OpenClaw's web_fetch tool is insufficient and a script-based or bulk conversion approach is preferred.\n---\n\n# Url2md\n\nConvert web pages to clean, readable Markdown.\n\n## Quick Start\n\n### Single URL\n\n```bash\npython3 scripts/url2md.py https://example.com/article\n```\n\nOutput to a file:\n```bash\npython3 scripts/url2md.py https://example.com/article -o article.md\n```\n\n### Batch Conversion\n\nCreate a file with URLs (one per line):\n```\nhttps://example.com/article-1\nhttps://example.com/article-2\nhttps://example.com/article-3\n```\n\nConvert all and save to a directory:\n```bash\npython3 scripts/url2md.py -f urls.txt -d ./markdown_files/\n```\n\n## Features\n\n- **No dependencies**: Uses only Python standard library (`urllib`, `html.parser`)\n- **Reader-style scope**: Strips `script`/`style`/`noscript`/`template`, then prefers the first `<article>` or `<main>` (else `<body>`) so output focuses on primary content\n- **Title extraction**: Uses `og:title` / Twitter title when present, otherwise `<title>`, added as H1 when enabled\n- **YAML Frontmatter**: Extracts structured metadata (title, author, published, description, category, source) from `<meta>` tags and Schema.org JSON-LD for knowledge-base workflows\n- **Template system**: Customize output format with variables (`{{title}}`, `{{content}}`, `{{author}}`, `{{published}}`, `{{date}}`, etc.)\n- **Link resolution**: Converts relative URLs to absolute\n- **Basic formatting**: Headings, paragraphs, lists, links, images, fenced code (with optional language), GFM-style tables, bold/italic\n- **Noise removal**: Skips navigation, sidebars, footers, forms, and other chrome inside the parsed fragment\n\n## Script Reference\n\n### `scripts/url2md.py`\n\n**Usage:**\n```\nurl2md.py [url] [options]\n```\n\n**Options:**\n| Option | Description |\n|--------|-------------|\n| `url` | Single URL to convert |\n| `-o, --output` | Output file (default: stdout) |\n| `-f, --file` | File containing URLs to convert |\n| `-d, --dir` | Output directory for batch conversion |\n| `--no-title` | Skip adding page title as H1 |\n| `--full-page` | Parse full `<body>` instead of `<article>`/`<main>` first (more chrome, wider coverage) |\n| `--timeout` | Request timeout in seconds (default: 30) |\n| `--frontmatter` | Add YAML frontmatter with extracted metadata |\n| `-t, --template` | Path to a template file for customizing output |\n| `--filename-template` | Batch mode filename pattern (e.g. `{{date}}-{{title}}.md`) |\n| `--download-images` | Download remote images to a local folder (e.g. `assets`) |\n| `-v, --version` | Show version |\n\n**Examples:**\n```bash\n# Single URL to stdout\npython3 scripts/url2md.py https://docs.python.org/3\n\n# Save to file\npython3 scripts/url2md.py https://docs.python.org/3 -o python-docs.md\n\n# Batch with custom timeout\npython3 scripts/url2md.py -f urls.txt -d ./output/ --timeout 60\n\n# Skip title\npython3 scripts/url2md.py https://example.com --no-title\n\n# Whole body (no article/main focus)\npython3 scripts/url2md.py https://example.com/sitemap --full-page -o sitemap.md\n\n# YAML frontmatter (great for Obsidian / PKM)\npython3 scripts/url2md.py https://example.com/article --frontmatter -o article.md\n\n# Custom template\npython3 scripts/url2md.py https://example.com/article -t article.tpl -o article.md\n\n# Batch with smart filenames\npython3 scripts/url2md.py -f urls.txt -d ./output/ --filename-template \"{{date}}-{{title}}.md\"\n\n# Download images locally\npython3 scripts/url2md.py https://example.com/article -o article.md --download-images assets\npython3 scripts/url2md.py -f urls.txt -d ./output/ --download-images assets\n```\n\n**Template variables:** `{{title}}`, `{{content}}`, `{{url}}`, `{{source}}`, `{{author}}`, `{{published}}`, `{{description}}`, `{{category}}`, `{{site_name}}`, `{{date}}`, `{{datetime}}`\n\n## When to Use\n\n- Converting documentation pages to Markdown for local reference\n- Archiving web articles as text files\n- Building a knowledge base with structured metadata (frontmatter / templates)\n- Building static content from dynamic sources\n- Extracting readable content when browser tools are unavailable\n- Batch processing a list of URLs\n\n## Limitations\n\n- Converts static HTML only; does not execute JavaScript\n- Complex layouts (multi-column, heavy CSS) may lose structural fidelity\n- Login-required or paywalled content requires authentication tokens\n- Rate-limited sites may block repeated requests\n","topics":["Documentation","Batch"],"tags":{"latest":"2.1.2"},"stats":{"comments":0,"downloads":442,"installsAllTime":17,"installsCurrent":0,"stars":2,"versions":7},"createdAt":1778385900087,"updatedAt":1778501344748},"latestVersion":{"version":"2.1.2","createdAt":1778457753432,"changelog":"- No changes detected in this release; documentation and functionality remain unchanged.\n- Version number updated to 2.1.2.","license":"MIT-0"},"metadata":null,"owner":{"handle":"rwonly","userId":"s1794exr2xv33gtjcr357zjtw183nvtn","displayName":"Rex Wang","image":"https://avatars.githubusercontent.com/u/4360619?v=4"},"moderation":null}