markdown.new/crawl

v1.0.0

Use `https://markdown.new/crawl/{target_url}` endpoints to recursively crawl a site section and return markdowns. Trigger this skill when the user asks for m...

0· 134·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for ctxinf/markdown-new-crawl.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "markdown.new/crawl" (ctxinf/markdown-new-crawl) from ClawHub.
Skill page: https://clawhub.ai/ctxinf/markdown-new-crawl
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install markdown-new-crawl

ClawHub CLI

Package manager switcher

npx clawhub@latest install markdown-new-crawl
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the instructions: all required actions (POST /crawl, poll status, GET per-URL) are appropriate for a site-crawling-to-markdown service and no unrelated credentials, binaries, or paths are requested.
Instruction Scope
SKILL.md stays within crawling behavior (construct requests, set limits, poll status). One minor vagueness: it says to "fall back to another method" if /crawl fails but doesn't define acceptable fallbacks — this could allow broader browsing methods unless the agent is constrained.
Install Mechanism
Instruction-only skill with no install spec or code files; nothing is written to disk and no external packages are fetched by the skill itself.
Credentials
No environment variables, credentials, or config paths are requested. The SKILL.md explicitly assumes no authentication is needed (note below about potential risk if private sites are targeted).
Persistence & Privilege
always:false and normal autonomous invocation are set. The skill does not request elevated or persistent system presence and does not modify other skills or agent-wide config.
Assessment
This skill appears to be a straightforward wrapper for a public crawl API and does not request credentials or install anything. Before installing: verify you trust the endpoint (markdown.new) and its owner; do not point the crawler at private/internal sites or URLs containing secrets (the service will fetch and store crawled content for 14 days by default); be aware of rate limits/costs documented in the SKILL.md; and be cautious about the ambiguous "fallback" behavior — confirm what fallback browsing methods the agent is allowed to use in your environment. If you need to crawl authenticated or sensitive sites, expect to need additional, explicit auth steps which are not described here.

Like a lobster shell, security has layers — review code before you run it.

latestvk97ds2sxdjv2g8b5awy05jfex983a5vg
134downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

Markdown.New Crawl Local-First Access

Use markdown.new/crawl for multi-page crawling and async Markdown generation.

Required Behavior

  1. Prefer local API calls with curl (or any suitable alternative tools).
  2. Assume no authentication is required unless the service behavior changes.
  3. Start with POST /crawl to get a job ID.
  4. Poll GET /crawl/status/{jobId} until crawl completes.
  5. Default to Markdown output; use ?format=json only when structured output is needed.
  6. Keep crawl scope explicit (limit, depth, subdomain/external toggles, include/exclude patterns).
  7. State default crawl behavior: same-domain links only, up to 500 pages per job.
  8. If /crawl fails (network, timeout, blocked host), fall back to another method and state the fallback.

Core Commands

curl -X POST "https://markdown.new/crawl" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://docs.example.com","limit":50}'
curl "https://markdown.new/crawl/status/<jobId>"
curl "https://markdown.new/crawl/status/<jobId>?format=json"
curl "https://markdown.new/crawl/https://docs.example.com"
curl -X DELETE "https://markdown.new/crawl/status/<jobId>"

API Coverage

  • POST /crawl: create async crawl job and return job ID.
  • GET /crawl/status/{jobId}: return crawl output and status.
  • DELETE /crawl/status/{jobId}: cancel a running crawl; completed pages remain available.
  • GET /crawl/{url}: browser-style shortcut that starts crawl and returns tracking page.

Common Crawl Options

  • url (required): crawl starting URL.
  • limit: max pages, 1-500 (default 500).
  • depth: max link depth, 1-10 (default 5).
  • render: enable JS rendering for SPA pages.
  • source: URL discovery strategy (all, sitemaps, links).
  • maxAge: max cache age seconds (0-604800, default 86400).
  • modifiedSince: UNIX timestamp; crawl pages modified after this time.
  • includeExternalLinks: include cross-domain links.
  • includeSubdomains: include subdomains.
  • includePatterns / excludePatterns: wildcard URL filtering.

Output Notes

  • GET /crawl/status/{jobId} returns concatenated Markdown by default.
  • Add ?format=json for per-page structured records.
  • Images are stripped by default; add ?retain_images=true to keep them.
  • Results are retained for 14 days; expired job IDs return errors.

Operational Limits

  • Rate model (as documented in the crawl page FAQ): each crawl costs 50 units against a 500 daily limit (about 10 crawls/day).
  • Prefer lower limit values for targeted extraction to reduce cost and runtime.

Comments

Loading comments...