xiaohongshu-extract

v1.0.0

Extract metadata from Xiaohongshu (XHS) share or discovery URLs by parsing window.__INITIAL_STATE__ and returning note details. Use when asked to fetch XHS page content, note metadata, video info, or engagement stats from a public XHS link.

1· 1.8k·6 current·6 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for jovijovi/xiaohongshu-extract.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "xiaohongshu-extract" (jovijovi/xiaohongshu-extract) from ClawHub.
Skill page: https://clawhub.ai/jovijovi/xiaohongshu-extract
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install xiaohongshu-extract

ClawHub CLI

Package manager switcher

npx clawhub@latest install xiaohongshu-extract
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description state that the skill extracts XHS metadata. The bundled Python script issues an HTTP GET, locates window.__INITIAL_STATE__, parses JSON, and returns structured/flattened fields (note, user, interact, video). No unrelated credentials, binaries, or system paths are requested.
Instruction Scope
SKILL.md instructs running the script with a user-provided URL and writing/printing JSON. That matches the code. However, the skill will perform arbitrary outbound HTTP requests to whatever URL is supplied; SKILL.md does not restrict inputs to official XHS domains. This introduces SSRF-like risk: if invoked with internal or otherwise sensitive URLs (e.g., 169.254.169.254, internal IPs, management endpoints), the agent will fetch and may return data from those endpoints. The script does not download video content itself (it only returns stream URLs), but it follows redirects and returns final_url/status_code.
Install Mechanism
There is no install spec (instruction-only skill + included script). That is low-risk. The script requires a Python runtime and the 'requests' library, but no installer or external downloads are performed by the skill itself.
Credentials
The skill declares no environment variables, credentials, or config paths and the code does not read environment secrets. It sets a custom User-Agent header for requests (expected). No disproportionate credential access is requested.
Persistence & Privilege
The skill is not marked always:true and does not modify other skills or system configs. It runs on invocation and does not request persistent platform-level privileges.
Assessment
This skill appears to do what it claims (fetch an XHS page and parse window.__INITIAL_STATE__). Before installing or invoking it: (1) Be careful what URLs you provide — the script will make HTTP requests to any given URL and could fetch internal/metadata endpoints (SSRF risk). Do not pass sensitive or private URLs. (2) The script returns streaming URLs it finds but does not itself download video files. (3) Ensure the runtime has Python and the requests package available and run the skill in an isolated environment if you are concerned about network exposure. If you need stricter safety, require or validate that input URLs belong to official Xiaohongshu domains before invocation.

Like a lobster shell, security has layers — review code before you run it.

latestvk97494mqgfwrd8vyq3cawqcv4h80jar1stablevk97494mqgfwrd8vyq3cawqcv4h80jar1
1.8kdownloads
1stars
1versions
Updated 2mo ago
v1.0.0
MIT-0

Xiaohongshu Extract

Overview

Extract note metadata (title, desc, type, time, user, engagement, tags, video stream info) from an XHS share or discovery URL using the bundled script.

Quick Start

Run the extractor and print JSON to stdout:

python scripts/xiaohongshu_extract.py "<xhs_url>" --pretty

Write JSON to a file:

python scripts/xiaohongshu_extract.py "<xhs_url>" --output /tmp/xhs_note.json

Output only the flattened record:

python scripts/xiaohongshu_extract.py "<xhs_url>" --flat-only --pretty

Write only the flattened record to a file:

python scripts/xiaohongshu_extract.py "<xhs_url>" --flat-only --output /tmp/xhs_flat.json

Emit errors as JSON:

python scripts/xiaohongshu_extract.py "<xhs_url>" --error-json

Emit errors as JSON to a file:

python scripts/xiaohongshu_extract.py "<xhs_url>" --error-json --output /tmp/xhs_error.json

Workflow

  1. Run scripts/xiaohongshu_extract.py with the user-provided URL.
  2. If the script fails to find window.__INITIAL_STATE__, ask the user for a direct discovery URL.
  3. Use the JSON output to summarize note metadata or to feed downstream analysis.

Output Notes

The script returns a JSON object with:

  • note_id, title, desc, type, time, ip_location
  • user (nickname, user_id, avatar)
  • interact (liked/collected/comment/share counts, plus normalized *_num values)
  • tags
  • video (video_id, duration, width, height, fps, size, stream_url)
  • field_mapping (nested-to-flat field name map)
  • flat (flattened record with normalized counts and ISO timestamp)

If the stream list is empty, video fields may be null or empty.

If --flat-only is set, only flat is printed. If --error-json is set, errors are emitted as JSON and may include final_url and status_code when available.

Resources

scripts/

  • scripts/xiaohongshu_extract.py extracts note metadata from XHS share/discovery URLs.

Comments

Loading comments...