Install
openclaw skills install links-to-pdfsScrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.
openclaw skills install links-to-pdfsCLI tool that scrapes documents from various sources into local PDF files using browser automation.
npm install -g docs-scraper
Scrape any document URL to PDF:
docs-scraper scrape https://example.com/document
Returns local path: ~/.docs-scraper/output/1706123456-abc123.pdf
Scrape with daemon (recommended, keeps browser warm):
docs-scraper scrape <url>
Scrape with named profile (for authenticated sites):
docs-scraper scrape <url> -p <profile-name>
Scrape with pre-filled data (e.g., email for DocSend):
docs-scraper scrape <url> -D email=user@example.com
Direct mode (single-shot, no daemon):
docs-scraper scrape <url> --no-daemon
When a document requires authentication (login, email verification, passcode):
Initial scrape returns a job ID:
docs-scraper scrape https://docsend.com/view/xxx
# Output: Scrape blocked
# Job ID: abc123
Retry with data:
docs-scraper update abc123 -D email=user@example.com
# or with password
docs-scraper update abc123 -D email=user@example.com -D password=1234
Profiles store session cookies for authenticated sites.
docs-scraper profiles list # List saved profiles
docs-scraper profiles clear # Clear all profiles
docs-scraper scrape <url> -p myprofile # Use a profile
The daemon keeps browser instances warm for faster scraping.
docs-scraper daemon status # Check status
docs-scraper daemon start # Start manually
docs-scraper daemon stop # Stop daemon
Note: Daemon auto-starts when running scrape commands.
PDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour.
Manual cleanup:
docs-scraper cleanup # Delete all PDFs
docs-scraper cleanup --older-than 1h # Delete PDFs older than 1 hour
docs-scraper jobs list # List blocked jobs awaiting auth
Each scraper accepts specific -D data fields. Use the appropriate fields based on the URL type.
Handles: URLs ending in .pdf
Data fields: None (downloads directly)
Example:
docs-scraper scrape https://example.com/document.pdf
Handles: docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com)
URL patterns:
https://docsend.com/view/{id} or https://docsend.com/v/{id}https://docsend.com/view/s/{id}https://{subdomain}.docsend.com/view/{id}Data fields:
| Field | Type | Description |
|---|---|---|
email | Email address for document access | |
password | password | Passcode/password for protected documents |
name | text | Your name (required for NDA-gated documents) |
Examples:
# Pre-fill email for DocSend
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com
# With password protection
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123
# With NDA name requirement
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name="John Doe"
# Retry blocked job
docs-scraper update abc123 -D email=user@example.com -D password=secret123
Notes:
Handles: notion.so/*, *.notion.site/*
Data fields:
| Field | Type | Description |
|---|---|---|
email | Notion account email | |
password | password | Notion account password |
Examples:
# Public page (no auth needed)
docs-scraper scrape https://notion.so/Public-Page-abc123
# Private page with login
docs-scraper scrape https://notion.so/Private-Page-abc123 \
-D email=user@example.com -D password=mypassword
# Custom domain
docs-scraper scrape https://docs.company.notion.site/Page-abc123
Notes:
Handles: Any URL not matched by other scrapers (automatic fallback)
Data fields: Dynamic - determined by Claude analyzing the page
The LLM scraper uses Claude to analyze the page HTML and detect:
Common dynamic fields:
| Field | Type | Description |
|---|---|---|
email | Login email (if detected) | |
password | password | Login password (if detected) |
username | text | Username (if login uses username) |
Examples:
# Generic webpage (no auth)
docs-scraper scrape https://example.com/article
# Webpage requiring login
docs-scraper scrape https://members.example.com/article \
-D email=user@example.com -D password=secret
# When blocked, check the job for required fields
docs-scraper jobs list
# Then retry with the fields the scraper detected
docs-scraper update abc123 -D username=myuser -D password=secret
Notes:
ANTHROPIC_API_KEY environment variable| Scraper | password | name | Other | |
|---|---|---|---|---|
| DirectPdf | - | - | - | - |
| DocSend | ✓ | ✓ | ✓ | - |
| Notion | ✓ | ✓ | - | - |
| LLM Fallback | ✓* | ✓* | - | Dynamic* |
*Fields detected dynamically from page analysis
Only needed for LLM fallback scraper:
export ANTHROPIC_API_KEY=your_key
Optional browser settings:
export BROWSER_HEADLESS=true # Set false for debugging
Archive a Notion page:
docs-scraper scrape https://notion.so/My-Page-abc123
Download protected DocSend:
docs-scraper scrape https://docsend.com/view/xxx
# If blocked:
docs-scraper update <job-id> -D email=user@example.com -D password=1234
Batch scraping with profiles:
docs-scraper scrape https://site.com/doc1 -p mysite
docs-scraper scrape https://site.com/doc2 -p mysite
Success: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf)
Blocked: Job ID + required credential types
docs-scraper daemon stop && docs-scraper daemon startdocs-scraper jobs list to check pending jobsdocs-scraper cleanup to remove old PDFs