Headless Brave Browser

v0.2.0

Headless web search and content extraction via the Brave Search API. Features exponential-backoff retry, circuit breaker fault isolation, bounded-concurrency...

2· 783· 1 versions· 2 current· 2 all-time· Updated 3h ago· MIT-0
byFranklin Kelechi@kelexine

Install

openclaw skills install brave-headless

brave-search

Headless web search and content extraction via the Brave Search API.

Setup

Run once before first use:

cd <skill-root>
npm ci

Required environment variable:

export BRAVE_API_KEY="your-key-here"

Get a free API key at brave.com/search/api.

Usage

Search

node scripts/search.js "query"                        # Basic (5 results)
node scripts/search.js "query" -n 10                  # Up to 20 results
node scripts/search.js "query" --content              # Include page content
node scripts/search.js "query" -n 3 --content         # Combined
node scripts/search.js "query" --json                 # Newline-delimited JSON
node scripts/search.js --help                         # Full options + env vars

Extract page content

node scripts/content.js https://example.com/article
node scripts/content.js https://example.com/article --json
node scripts/content.js https://example.com/article --max-length 8000

Output format (plain text)

--- Result 1 ---
Title:   Page Title
URL:     https://example.com/page
Snippet: Description from Brave Search
Content:
  # Page Title

  Extracted markdown content...

--- Result 2 ---
...

Pass --json to get one JSON object per line instead, suitable for piping.

Exit codes

CodeMeaning
0Success
1Invalid input or configuration error
2Page had no extractable content (content.js)
130Interrupted (SIGINT)

Configuration (environment variables)

All behaviour is configurable without touching code:

VariableDefaultDescription
BRAVE_API_KEYRequired. Brave Search subscription token
LOG_LEVELinfodebug · info · warn · error · silent
LOG_JSONfalseEmit logs as newline-delimited JSON to stderr
FETCH_TIMEOUT_MS15000Per-page fetch timeout
SEARCH_TIMEOUT_MS10000Brave API call timeout
MAX_CONTENT_LENGTH5000Max chars of extracted content
MAX_RETRY_ATTEMPTS3Retry attempts on transient errors
RETRY_BASE_DELAY_MS500Base delay for exponential backoff
RETRY_MAX_DELAY_MS30000Backoff delay cap
CONCURRENCY_LIMIT3Parallel page fetches when --content is set
CB_FAILURE_THRESHOLD5Consecutive failures before circuit opens
CB_RESET_TIMEOUT_MS60000Circuit breaker reset window

All variables are validated at startup — misconfigured runs fail immediately with a descriptive list of every bad value rather than crashing mid-execution.

Architecture

See references/ARCHITECTURE.md for a full module breakdown.

scripts/
├── search.js            ← Search CLI entry point
├── content.js           ← Content extraction CLI entry point
├── content-fetcher.js   ← HTTP fetch + Readability + DOM fallback
├── config.js            ← Schema-validated env config
├── circuit-breaker.js   ← Fault isolation (CLOSED → OPEN → HALF_OPEN)
├── retry.js             ← Exponential backoff with full jitter
├── concurrency.js       ← Bounded parallel execution pool
├── utils.js             ← htmlToMarkdown, smartTruncate, parseURL
├── logger.js            ← Structured leveled logger → stderr
└── errors.js            ← Typed error hierarchy

Version tags

latestvk97eczdxsjjeq41v8h3z0az2j581fbjx

Runtime requirements

🔍 Clawdis
OSmacOS · Linux
Binsnode, npm
EnvBRAVE_API_KEY
Primary envBRAVE_API_KEY

Install

Nodenpm i -g @mozilla/readability
Nodenpm i -g jsdom
Nodenpm i -g turndown
Nodenpm i -g turndown-plugin-gfm