Install
openclaw skills install rednote-contactsRun the installed red-crawler CLI for Xiaohongshu contact discovery. Requires the red-crawler command and Playwright browser runtime; not instruction-only.
openclaw skills install rednote-contactsUse this skill when you need to operate the installed red-crawler CLI from an OpenClaw workflow. It is the portable wrapper for the crawler runtime, not a separate crawler implementation.
Use red-crawler-ops for:
red-crawlerAll crawling tasks must use the native red-crawler CLI commands:
Collect users from the Xiaohongshu fashion homefeed. This is the default crawl mode when action is omitted. It clicks card author links, not note links.
red-crawler crawl-homefeed \
--homefeed-url "https://www.xiaohongshu.com/explore?channel_id=homefeed.fashion_v3" \
--max-accounts 20 \
--db-path "./data/red_crawler.db" \
--output-dir "./output"
For IP rotation in local browser mode, provide proxy or proxy_list and set rotation_mode: session. The wrapper passes these through to red-crawler; after 403 or 429, the crawler starts a new browser session using the next proxy. Each proxy maps deterministically to one browser header set, so the same outbound IP keeps the same User-Agent.
Bright Data Browser API mode does not use local proxy settings. Use browser_mode: bright-data plus browser_auth or browser_endpoint; if either value contains {session}, the crawler replaces it with a random session id for each browser session.
Crawl a specific Xiaohongshu user profile and extract contact information.
red-crawler crawl-seed \
--seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
--max-accounts 5 \
--max-depth 2 \
--db-path "./data/red_crawler.db" \
--output-dir "./output"
Parameters:
--seed-url (required): Target user profile URL--storage-state: Optional Playwright storage state file--max-accounts: Maximum accounts to crawl (default: 20)--max-depth: Crawl depth for related accounts (default: 2)--include-note-recommendations: Include note recommendations--safe-mode: Enable safe mode--cache-dir: Cache directory path--cache-ttl-days: Cache TTL in days (default: 7)--db-path: SQLite database path (default: ./data/red_crawler.db)--output-dir: Output directory (default: ./output)Outputs:
accounts.csv: Crawled account informationcontact_leads.csv: Extracted contact information (emails, etc.)run_report.json: Execution reportOptional interactive login to save browser session.
red-crawler login --save-state "./state.json"
Parameters:
--save-state (required): Path to save storage state--login-url: Login page URL (default: https://www.xiaohongshu.com)QR code-based login for headless environments.
# Start QR login (generates QR code)
red-crawler login-qr-start \
--save-state "./state.json" \
--qr-path "./login-qr.png" \
--session-path "./login-session.json" \
--timeout 180
# Finish QR login after user scans
red-crawler login-qr-finish \
--save-state "./state.json" \
--session-path "./login-session.json"
Run scheduled nightly data collection.
red-crawler collect-nightly \
--db-path "./data/red_crawler.db" \
--report-dir "./reports" \
--crawl-budget 30 \
--search-term-limit 4
Parameters:
--storage-state: Optional storage state file--db-path: Database path (default: ./data/red_crawler.db)--report-dir: Report directory (default: ./reports)--cache-dir: Cache directory--cache-ttl-days: Cache TTL (default: 7)--crawl-budget: Crawl budget (default: 30)--search-term-limit: Search term limit (default: 4)--startup-jitter-minutes: Startup jitter--slot-name: Slot name for schedulingExport weekly reports from database.
red-crawler report-weekly \
--db-path "./data/red_crawler.db" \
--report-dir "./reports" \
--days 7
Parameters:
--db-path: Database path (default: ./data/red_crawler.db)--report-dir: Report directory (default: ./reports)--days: Report period in days (default: 7)Outputs:
weekly-growth-report.jsoncontactable_creators.csvList contactable creators from database.
red-crawler list-contactable \
--db-path "./data/red_crawler.db" \
--lead-type "email" \
--creator-segment "creator" \
--min-relevance-score 0.5 \
--limit 20 \
--format csv
Parameters:
--db-path: Database path (default: ./data/red_crawler.db)--lead-type: Lead type filter (default: email)--creator-segment: Creator segment filter (default: creator)--min-relevance-score: Minimum relevance score (default: 0.0)--limit: Result limit (default: 20)--format: Output format - table or csv (default: table)Open Xiaohongshu in browser with saved session.
red-crawler open --storage-state "./state.json"
bootstrapcrawl_homefeed (default when omitted)crawl_seedlogincollect_nightlyreport_weeklylist_contactablejob_statusjob_logsjob_stopack_eventFor long crawl actions, set run_mode: background. The skill starts a local background wrapper, returns a job_id immediately, and writes OpenClaw-readable state under heartbeat_dir (default ./.openclaw/red-crawler).
Background jobs write:
HEARTBEAT.md: concise status and pending user updates for OpenClaw heartbeat pollingjobs/<job_id>.json: machine-readable job stateevents/<job_id>.jsonl: completion or failure eventslogs/jobs/<job_id>.out.log and .err.log: crawler logsUse job_status with the returned job_id to read the latest state manually. Use job_logs to inspect recent output and job_stop to request termination. If OpenClaw heartbeat polling is enabled, the agent can read HEARTBEAT.md and surface pending completion events to the user on a later heartbeat cycle. After surfacing an event, call ack_event with the event_id so the same update is not reported again.
bootstrap for an existing workspace)crawl_homefeed)login to fetch/refresh the Playwright session state)collect_nightly to continue crawling based on the database queue)report_weekly pointing to the workspace's DB)Crawling New Data vs Querying Database:
crawl_seed with seed_url, setting max_accounts to 10. Note: crawling new data requires a seed URL.)action to list_contactable, limit to 10, and creator_segment to "美妆" to filter the local SQLite database)(Also understands technical prompt variations:)
install_browser: true after I have installed the CLI."output/."On Windows, red-crawler runs inside WSL2. You need:
sudo apt-get update
sudo apt-get install -y git python3 python3-pip
Known Issues & Fixes:
DISPLAY not set (WSLg)
Missing X server or $DISPLAYexport DISPLAY=:0
Headless vs Headed browser
login command requires headed browser (GUI)crawl-seed and other commands also require headed browser on WSLDISPLAY=:0 before running any command with browsersudo apt-get update
sudo apt-get install -y git python3 python3-pip
sudo apt-get install -y xvfb
export DISPLAY=:99
Xvfb :99 -screen 0 1024x768x16 &
uv tool install red-crawler==0.1.3
bootstrap with install_browser: true.red-crawler as a package, then point workspace_path at a local working directory.require_local_checkout: true only when you intentionally want to run from a source checkout.uv is only required when sync_dependencies: true is used for a local source checkout.bootstrap does not create a login session. Use login explicitly.login creates an optional Playwright storage state explicitly.crawl_seed, crawl_homefeed, and collect_nightly can run without a storage state file.report_weekly and list_contactable run from the database and do not require storage state.pyproject.toml.login only when the user asks to authenticate.state.json unless the user explicitly asks to authenticate.Provide an object with action plus optional fields used by the selected action. Common fields include:
workspace_pathrequire_local_checkoutrunner_commandstorage_statedb_pathreport_diroutput_dircache_dirheartbeat_dirjob_log_dirrun_mode (sync/background)job_idevent_idtail_linesAction-specific fields include:
sync_dependenciesinstall_browserseed_urllogin_urlmax_accountsmax_depthinclude_note_recommendationssafe_modecache_ttl_daysgender_filter (male/female or 男/女)crawl_budgetsearch_term_limitstartup_jitter_minutesslot_namedayslead_typecreator_segmentmin_relevance_scorelimitformatbrowser_modebrowser_endpointbrowser_authproxyproxy_listrotation_mode (none/session)rotation_retriesrandomize_headersSuccessful runs return:
statusactioncommandsummaryartifactsmetricsnext_stepstdoutstderrjob_id and pid when run_mode: background accepts a jobjob for job status actionsError runs return:
statusactionerror_typemessagesuggested_fixaction, command, stdout, and stderr for execution-time failuresaction, command, stdout, and stderr