Install
openclaw skills install datalens-web-scraperUse DataLens MCP tools to scrape structured data from any website open in Chrome. Triggers when the user wants to extract lists, tables, comments, products,...
openclaw skills install datalens-web-scraperEvery DataLens tool is invoked by running a terminal command. No MCP client configuration is required.
The datalens-mcp-call binary handles the MCP stdio handshake and returns the tool result as YAML/JSON to stdout.
run_in_terminal: datalens-mcp-call <tool_name> '<args_json>'
If datalens-mcp-call is not on PATH (e.g. not globally installed), use npx:
run_in_terminal: npx datalens-mcp-call <tool_name> '<args_json>'
datalens-mcp-server npm package installed: npm install -g datalens-mcp-server (or use npx).url in the tool args — the extension will open it).datalens-mcp-call spawns the DataLens MCP proxy as a child process, performs the MCP initialization handshake over stdio, calls the requested tool, and prints the result.
AI Agent
↓ run_in_terminal
datalens-mcp-call <tool> <args>
↓ stdio JSON-RPC
DataLens MCP Proxy (datalens-mcp-proxy)
↓ WebSocket (localhost:17373)
Chrome Extension
↓
Browser Tab
Follow these steps in order. Do not skip steps or call scrape_start before scrape_analyze_columns completes.
datalens-mcp-call scrape_detect_tables '{"url":"https://example.com","prompt":"article list"}'
Returns a list of detected table structures with rootSelector, itemSelector, documentInfoPath. Pick the best matching table and copy those three values for subsequent steps.
If the page requires login, ask the user to log in in Chrome first, then re-run this command.
datalens-mcp-call scrape_get_table_tree '{"rootSelector":"<from step 1>","itemSelector":"<from step 1>","documentInfoPath":"<from step 1>"}'
Use when the data has nested replies, collapsed rows, or "load more" buttons. Inspect the _uid-annotated tree in the output to identify expand button UIDs.
datalens-mcp-call scrape_click_expand_and_redetect '{"rootSelector":"...","itemSelector":"...","documentInfoPath":"...","expandButtonUids":[{"type":"reply","uids":["uid1","uid2"]}]}'
The extension clicks the buttons, waits for new content, then re-detects. Use the updated rootSelector/itemSelector/documentInfoPath from this output in Step 3.
datalens-mcp-call scrape_analyze_columns '{"rootSelector":"...","itemSelector":"...","documentInfoPath":"...","url":"https://example.com","prompt":"article list"}'
Calls the backend AI to identify fields, data types, and pagination. Returns a scraperConfig and jobDraft. Confirm the field list looks correct before proceeding.
# Pass the jobDraft object returned by scrape_analyze_columns
datalens-mcp-call scrape_start '{"jobDraft":<paste jobDraft here>,"maxRecords":10}'
Returns a jobId. Use maxRecords: 10 for a preview run first.
datalens-mcp-call scrape_status '{"jobId":"<jobId>","waitMs":3000}'
Re-run until status is COMPLETED, FAILED, or STOPPED.
Key status fields:
status: QUEUED → PREPARING → RUNNING → COMPLETED / FAILED / STOPPEDscrapedCount: rows collected so farerror: present only on failureSave to file (recommended for large results):
datalens-mcp-call scrape_export_to_file '{"jobId":"<jobId>","outputDir":"/tmp/datalens","format":"json"}'
Returns the saved file path.
Inline preview (small result sets):
datalens-mcp-call scrape_result '{"jobId":"<jobId>","limit":50}'
Use the cursor field from each response to fetch the next page.
In-memory export:
datalens-mcp-call scrape_export '{"jobId":"<jobId>","format":"csv"}'
Returns base64-encoded file content.
datalens-mcp-call scrape_pause '{"jobId":"<jobId>"}'
datalens-mcp-call scrape_resume '{"jobId":"<jobId>"}'
datalens-mcp-call scrape_stop '{"jobId":"<jobId>"}'
datalens-mcp-call browser_list_tabs
datalens-mcp-call browser_open_tab '{"url":"https://example.com"}'
datalens-mcp-call browser_use_tab '{"tabId":123}'
datalens-mcp-call browser_close_tab '{"tabId":123}'
Tab management is usually not needed — scrape_detect_tables with a url arg handles tab opening automatically.
scrape_start without a jobDraft or scraperConfig from a prior scrape_analyze_columns response. Fabricating a scraperConfig will produce wrong results.scrape_analyze_columns and jump straight to scrape_start. The analyze step is required to build the config.scrape_detect_tables returns an empty list, the page may need login or may be dynamically loaded. Ask the user to open the target URL in Chrome and scroll to load content, then retry.scrape_status stays at QUEUED for more than 30 seconds, check that the Chrome extension is active and that a tab for the target URL is open.maxRecords: 10 for a preview scrape to confirm the config is correct before running a full job.# 1. Detect tables on the homepage
datalens-mcp-call scrape_detect_tables '{"url":"https://www.toutiao.com/?is_new_connect=0&is_new_user=0","prompt":"article list"}'
# 2. Analyze columns (fill in selectors from step 1 output)
datalens-mcp-call scrape_analyze_columns '{"rootSelector":"<from step 1>","itemSelector":"<from step 1>","documentInfoPath":"<from step 1>","url":"https://www.toutiao.com/?is_new_connect=0&is_new_user=0","prompt":"article list"}'
# 3. Preview run — first 10 rows (paste the full jobDraft JSON object from step 2)
datalens-mcp-call scrape_start '{"jobDraft":<paste jobDraft>,"maxRecords":10}'
# 4. Poll until status is COMPLETED
datalens-mcp-call scrape_status '{"jobId":"<jobId>","waitMs":3000}'
# 5. Save results to file
datalens-mcp-call scrape_export_to_file '{"jobId":"<jobId>","outputDir":"/tmp/datalens","format":"json"}'
Set DATALENS_TIMEOUT=180000 before running if a tool call takes longer than the default 120 s:
DATALENS_TIMEOUT=180000 datalens-mcp-call scrape_analyze_columns '...'
These are for troubleshooting only. Do not use in normal scraping workflows.
datalens-mcp-call debug_get_logs '{"levels":["error"]}'
datalens-mcp-call debug_clear_logs '{}'
datalens-mcp-call debug_export_logs_to_file '{"outputDir":"/tmp/datalens"}'