zhipu web fetch

v1.0.0

Zhipu AI Web Page Reader Tool - Fetches and parses web page content into structured Markdown or text via cURL. Use when: - Need to fetch and read the content...

0· 272· 1 versions· 1 current· 1 all-time· Updated 1mo ago· MIT-0

Zhipu Web Page Reader

Fetch and parse web page content via Zhipu AI's Reader API (/paas/v4/reader), using lightweight cURL. Returns parsed page content in Markdown or plain text format, along with metadata like title and description.

Quick Start

Basic cURL Usage

curl --request POST \
  --url https://open.bigmodel.cn/api/paas/v4/reader \
  --header "Authorization: Bearer $ZHIPU_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://www.example.com"
  }'

Script Usage

A wrapper shell script is provided for convenience.

# Basic Fetch (returns Markdown by default)
bash scripts/zhipu_fetch.sh --url "https://www.example.com"

# Fetch as plain text, no cache
bash scripts/zhipu_fetch.sh \
  --url "https://docs.python.org/3/" \
  --format text \
  --no-cache

# Fetch with image and link summaries
bash scripts/zhipu_fetch.sh \
  --url "https://news.example.com/article" \
  --images-summary \
  --links-summary

# Fetch without images, disable GFM
bash scripts/zhipu_fetch.sh \
  --url "https://blog.example.com/post" \
  --no-images \
  --no-gfm

API Parameter Reference

ParameterTypeRequiredDefaultDescription
urlstring-URL of the web page to fetch
timeoutinteger-20Request timeout in seconds
no_cacheboolean-falseDisable caching (true/false)
return_formatstring-markdownReturn format: markdown or text
retain_imagesboolean-trueRetain images in output (true/false)
no_gfmboolean-falseDisable GitHub Flavored Markdown (true/false)
keep_img_data_urlboolean-falseKeep image data URLs (true/false)
with_images_summaryboolean-falseInclude images summary (true/false)
with_links_summaryboolean-falseInclude links summary (true/false)

Response Structure

The API returns JSON with the parsed page content.

{
  "id": "task-id",
  "created": 1704067200,
  "request_id": "request-id",
  "model": "model-name",
  "reader_result": {
    "title": "Page Title",
    "description": "Brief page description",
    "url": "https://www.example.com",
    "content": "Parsed page content (Markdown or text)",
    "external": {
      "stylesheet": {}
    },
    "metadata": {
      "keywords": "page, keywords",
      "viewport": "width=device-width",
      "description": "Meta description",
      "format-detection": "telephone=no"
    }
  }
}

Key Response Fields

FieldDescription
reader_result.contentMain parsed content (body text, images, links)
reader_result.titlePage title
reader_result.descriptionBrief page description
reader_result.urlOriginal page URL
reader_result.metadataPage metadata (keywords, viewport, etc.)

Common Use Cases

ScenarioCommand
Read a documentation page--url <doc_url>
Extract text only (no images)--url <url> --no-images --format text
Force fresh fetch (bypass cache)--url <url> --no-cache
Get content with all summaries--url <url> --images-summary --links-summary
Long page with extended timeout--url <url> --timeout 60

Environment Requirements

  • Environment variable ZHIPU_API_KEY must be configured.
  • curl command must be available in your system path.

Version tags

latestvk973rjn97drcatdmtq43nw84xh82xpjd

Runtime requirements

Binscurl
EnvZHIPU_API_KEY