Install
openclaw skills install indeed-brightdataSearch and scrape Indeed job listings and company information using Bright Data's Web Scraper API. Use when the user asks to find jobs on Indeed, search for...
openclaw skills install indeed-brightdataSearch Indeed for job listings and company info via Bright Data's Web Scraper API. Designed for recruiting workflows on messaging platforms (Telegram, Signal) with smart defaults.
BRIGHTDATA_API_KEY environment variable must be setcurl and jq must be availableUser wants job info?
├── Has a specific Indeed URL?
│ ├── Job URL (/viewjob?) → indeed_jobs_by_url.sh [SYNC — seconds]
│ ├── Company jobs URL (/cmp/*/jobs) → indeed_jobs_by_company.sh [ASYNC — minutes]
│ └── Company page URL (/cmp/*) → indeed_company_by_url.sh [SYNC — seconds]
├── Wants to search by keyword/location?
│ └── indeed_smart_search.sh [ASYNC — 3-8 min]
│ Agent says: "Searching now, this takes a few minutes."
│ If results < 5: auto-expands date range, do NOT ask user
│ Always pipe output through: indeed_format_results.sh --top 5
├── Wants company info?
│ ├── Has Indeed company URL → indeed_company_by_url.sh [SYNC — seconds]
│ ├── Has keyword → indeed_company_by_keyword.sh [ASYNC — minutes]
│ └── Has industry + state → indeed_company_by_industry.sh [ASYNC — minutes]
└── Check pending results? → indeed_check_pending.sh (run on heartbeat)
Always prefer sync (URL-based) scripts when the user provides a URL — they return in seconds.
| Script | Purpose | Mode |
|---|---|---|
indeed_smart_search.sh | Primary job search — keyword expansion, parallel queries, dedup, caching | ASYNC |
indeed_jobs_by_url.sh | Collect job details by URL(s) | SYNC |
indeed_jobs_by_keyword.sh | Low-level single-keyword job search (used by smart search internally) | ASYNC |
indeed_jobs_by_company.sh | Discover jobs from company page | ASYNC |
indeed_company_by_url.sh | Collect company info by URL | SYNC |
indeed_company_by_keyword.sh | Discover companies by keyword | ASYNC |
indeed_company_by_industry.sh | Discover companies by industry/state | ASYNC |
indeed_format_results.sh | Format JSON results into summary, full, or CSV | Local |
indeed_check_pending.sh | Check/fetch completed pending searches + auto-cleanup | Local/API |
indeed_poll_and_fetch.sh | Poll async job and fetch results (internal) | API |
indeed_list_datasets.sh | List available Indeed dataset IDs | API |
User says: "Find me cybersecurity jobs in New York"
scripts/indeed_smart_search.sh "cybersecurity" US "New York, NY" \
| scripts/indeed_format_results.sh --type jobs --top 5
User says: "Get details on this job: https://www.indeed.com/viewjob?jk=abc123"
scripts/indeed_jobs_by_url.sh "https://www.indeed.com/viewjob?jk=abc123"
indeed_format_results.sh.indeed_check_pending.sh first before starting a new search.---SPLIT--- markers from indeed_format_results.sh to break across messages.# Basic search (expands keywords, deduplicates, defaults to last 7 days)
scripts/indeed_smart_search.sh "cybersecurity" US "Remote"
# All-time search
scripts/indeed_smart_search.sh "nursing" US "Texas" --all-time
# Skip keyword expansion
scripts/indeed_smart_search.sh "registered nurse" US "Ohio" --no-expand
# Bypass 6-hour cache
scripts/indeed_smart_search.sh "data science" US "New York" --force
Output is {"meta": {...}, "results": [...]} with metadata including query params, keywords used, and result counts.
# Telegram-friendly summary (default)
scripts/indeed_format_results.sh --type jobs --top 5 results.json
# CSV export
scripts/indeed_format_results.sh --type jobs --format csv results.json
# Companies
scripts/indeed_format_results.sh --type companies --top 5 companies.json
# Pipe from smart search
scripts/indeed_smart_search.sh "nurse" US "Ohio" | scripts/indeed_format_results.sh --top 5
scripts/indeed_check_pending.sh
# Output: {"completed":[...],"still_pending":[...],"failed":[...]}
Run this periodically. If ~/.config/indeed-brightdata/pending.json exists and is non-empty, check for completed results. Format completed results with indeed_format_results.sh and send to the user.
| Code | Meaning | Agent should... |
|---|---|---|
| 0 | Success — results on stdout | Format and present results |
| 1 | Error — something failed | Report the error |
| 2 | Deferred — still processing, saved to pending | Tell user "results are still processing, I'll follow up" |
Smart search caches results for 6 hours. Identical searches (same keyword + location + country) return cached results without API calls. Use --force to bypass. Old results (>7 days) are auto-cleaned by indeed_check_pending.sh.
All persistent data is stored under ~/.config/indeed-brightdata/:
| File | Purpose | Lifecycle |
|---|---|---|
datasets.json | Bright Data dataset IDs | Created on first indeed_list_datasets.sh --save, rarely changes |
pending.json | In-flight async snapshots | Entries added on poll timeout (exit 2) or fire-and-forget (--no-wait), removed when fetched or after 24h |
history.json | Search cache index | Entries added per search, auto-cleaned after 7 days |
results/*.json | Fetched result data | Written when snapshots complete, auto-cleaned after 7 days |
Auto-cleanup runs at the start of indeed_check_pending.sh. No data is sent anywhere other than the Bright Data API.
All scripts source scripts/_lib.sh for shared HTTP and persistence functions. The library:
https://api.brightdata.com/datasets/v3BRIGHTDATA_API_KEY (sent via Authorization: Bearer header)~/.config/indeed-brightdata/ (see Data Storage above)See references/api-reference.md for complete endpoint documentation, response schemas, and country/domain mappings.
See references/keyword-expansions.json for the lookup table of keyword-to-job-title mappings.