Web Fetch as Markdown
Fetches any web URL and converts it to clean, structured Markdown — stripping ads, navigation, and clutter to leave only readable content, making it far easier for Agents to parse and extract data compared to raw HTML.
Conversion Services & Priority
This skill uses reputable third-party APIs to facilitate conversion. Always be transparent with the user about which service is being used.
- Primary —
https://markdown.new/: Official Cloudflare edge conversion service. Highly reliable for general sites, especially Cloudflare-hosted ones, and optimized for token reduction.
- Fallback 1 —
https://r.jina.ai/<url>: Jina AI's official Reader API. Excellent for parsing clean markdown from complex pages when the primary service fails.
- Fallback 2 (Requires Consent) —
https://markdownforagents.com/r?url=<url>: General-purpose converter returning Markdown with YAML frontmatter. Only use when primary services are unreachable, and after informing the user.
Execution Flow & Graceful Degradation
When tasked with fetching a URL, follow this strict sequence:
-
Attempt Primary Service: Construct the URL by prepending the target to the primary service:
https://markdown.new/https://example.com/article
Use the built-in WebFetch tool or equivalent to retrieve the content.
-
Graceful Fallback to Jina: If the primary service fails, times out, or returns a domain safety warning (e.g., "Unable to verify if domain is safe"), gracefully fall back to Jina Reader:
https://r.jina.ai/https://example.com/article
-
User Consent for Unverified Services: If both Cloudflare and Jina are unreachable (e.g., due to regional network policies), prompt the user:
"The primary markdown conversion services (Cloudflare & Jina) are currently unreachable. Would you like me to attempt fetching via markdownforagents.com, or would you prefer to provide the content locally?"
Do not proceed to markdownforagents.com without explicit user approval.
-
Final Limitation Handling: If all API routes fail or are denied by the user, inform the user that the content cannot be fetched automatically due to network or security constraints. Do not attempt to use curl with silent flags or suggest network workarounds.
Trigger Scenarios
User Intent (Chinese):
- "上网搜索"、"帮我查一下"、"帮我看看这个网页"、"fetch 网页"
- "帮我抓取这个页面"、"帮我解析一下"、"打开这个链接"
User Intent (English):
- "search online", "look up", "fetch this page", "scrape this website"
- "parse the data from", "extract information from"
System Scenarios:
- The built-in WebFetch tool returns a domain safety restriction (e.g., "Unable to verify if domain *** is safe to fetch").
- The user asks to read/extract content from a specific URL where direct HTML parsing is inefficient.
Safety & Compliance Guidelines
- Respect Platform Boundaries: If the platform's built-in tools block a domain for safety reasons, do not instruct the agent to bypass these safeguards. Use the approved fallback APIs instead.
- No Network Evasion: Never suggest configuring proxies, VPNs, or other methods to circumvent network restrictions or regional blocks.
- Data Transparency: Be aware that sending URLs to conversion services means the URL path (and potentially any non-authenticated content on that page) is exposed to the API provider. Avoid fetching pages known to contain sensitive authentication tokens in the URL.