Install
openclaw skills install @zeeshanadilbutt/olostepScrape webpages, search Google, crawl sites, batch-scrape up to 10k URLs, map site structure, and get AI-powered answers with citations using the Olostep Web...
openclaw skills install @zeeshanadilbutt/olostepFetch live web content via the Olostep API. Covers scraping, searching, crawling, batch processing, site mapping, AI-powered answers, and structured data extraction.
Authentication: Every request needs Authorization: Bearer $OLOSTEP_API_KEY. If the env var is missing, stop and tell the user to set it. Get a free key (500 req/month) at https://olostep.com/auth.
Base URL: https://api.olostep.com/v1
Extract content from any URL as markdown, HTML, JSON, or text. Handles JavaScript rendering and anti-bot protections automatically.
curl -sS -X POST "https://api.olostep.com/v1/scrapes" \
-H "Authorization: Bearer $OLOSTEP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url_to_scrape": "https://example.com/page",
"formats": ["markdown"]
}'
Response: Content is in result.markdown_content (or result.html_content, result.text_content, result.json_content depending on requested formats).
| Parameter | Required | Default | Description |
|---|---|---|---|
url_to_scrape | Yes | — | URL to scrape |
formats | Yes | — | Array: markdown, html, text, json, screenshot |
country | No | — | Country code for geo-targeted scraping (US, GB, IN) |
wait_before_scraping | No | 0 | Milliseconds to wait for JS rendering (0–10000) |
parser | No | — | Parser object {"id": "@olostep/google-search"} for structured JSON |
llm_extract | No | — | Object with schema for LLM-based extraction |
When to use: Single page extraction — docs, articles, product pages, profiles.
Tips:
formats: ["markdown"] — most token-efficient for LLM processingwait_before_scraping: 2000Search Google by scraping a Google URL with the @olostep/google-search parser. No separate search endpoint — search goes through /v1/scrapes.
curl -sS -X POST "https://api.olostep.com/v1/scrapes" \
-H "Authorization: Bearer $OLOSTEP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url_to_scrape": "https://www.google.com/search?q=best+AI+coding+tools+2026&gl=us",
"formats": ["json"],
"parser": {"id": "@olostep/google-search"}
}'
Response: result.json_content is a stringified JSON string. Parse it to get organic (array of {title, link, snippet}), knowledgeGraph, peopleAlsoAsk, relatedSearches.
How to build the Google URL:
https://www.google.com/search?q=YOUR+QUERY&gl=us for country (ISO codes: us, gb, de, in)+)When to use: Research, finding docs, competitive analysis, debugging errors.
Async crawl that discovers and scrapes pages by following links. Poll for results.
# Start crawl
curl -sS -X POST "https://api.olostep.com/v1/crawls" \
-H "Authorization: Bearer $OLOSTEP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"start_url": "https://docs.example.com",
"max_pages": 10
}'
# Check status (poll until status is "completed")
curl -sS "https://api.olostep.com/v1/crawls/CRAWL_ID" \
-H "Authorization: Bearer $OLOSTEP_API_KEY"
# Get pages (once completed)
curl -sS "https://api.olostep.com/v1/crawls/CRAWL_ID/pages?limit=10" \
-H "Authorization: Bearer $OLOSTEP_API_KEY"
Pages return retrieve_id per page. Use /v1/retrieve?retrieve_id=ID&formats=markdown to get content.
| Parameter | Required | Default | Description |
|---|---|---|---|
start_url | Yes | — | Starting URL |
max_pages | Yes | — | Maximum pages to crawl |
include_urls | No | ["/**"] | Glob patterns to include (["/blog/**"]) |
exclude_urls | No | — | Glob patterns to exclude (["/admin/**"]) |
max_depth | No | — | Maximum link depth from start URL |
When to use: Ingesting docs sites, blog archives, product catalogs.
Scrape up to 10,000 URLs in one parallel batch. Async — poll for results.
curl -sS -X POST "https://api.olostep.com/v1/batches" \
-H "Authorization: Bearer $OLOSTEP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"items": [
{"url": "https://example.com/page1", "custom_id": "page1"},
{"url": "https://example.com/page2", "custom_id": "page2"}
]
}'
# Check status
curl -sS "https://api.olostep.com/v1/batches/BATCH_ID" \
-H "Authorization: Bearer $OLOSTEP_API_KEY"
# Get results (once completed)
curl -sS "https://api.olostep.com/v1/batches/BATCH_ID/items?limit=20" \
-H "Authorization: Bearer $OLOSTEP_API_KEY"
Items return retrieve_id. Use /v1/retrieve?retrieve_id=ID&formats=markdown for content.
When to use: Large-scale extraction — product pages, directories, documentation sets.
Discover all URLs on a site without scraping content. Synchronous — returns immediately.
curl -sS -X POST "https://api.olostep.com/v1/maps" \
-H "Authorization: Bearer $OLOSTEP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"include_urls": ["/blog/**"],
"top_n": 50
}'
Response: urls array of discovered URLs, urls_count total.
| Parameter | Required | Default | Description |
|---|---|---|---|
url | Yes | — | Website to map |
search_query | No | — | Sort URLs by relevance |
top_n | No | — | Limit number of URLs |
include_urls | No | — | Glob patterns to include |
exclude_urls | No | — | Glob patterns to exclude |
When to use: Site analysis, content auditing, planning before crawl/batch.
Web-sourced answers with citations. Optionally provide JSON schema for structured output. Synchronous.
curl -sS -X POST "https://api.olostep.com/v1/answers" \
-H "Authorization: Bearer $OLOSTEP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"task": "What are the top 5 AI agent frameworks in 2026?"
}'
With structured output:
curl -sS -X POST "https://api.olostep.com/v1/answers" \
-H "Authorization: Bearer $OLOSTEP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"task": "Find the founders and funding of Olostep",
"json_format": {"company": "", "founders": [], "total_funding": "", "last_round": ""}
}'
Response: result.json_content matches your schema. result.sources lists URLs used.
When to use: Research, fact-checking, competitive analysis, structured web intelligence.
Crawl and batch results return retrieve_id per item. Get actual content with:
curl -sS "https://api.olostep.com/v1/retrieve?retrieve_id=RETRIEVE_ID&formats=markdown" \
-H "Authorization: Bearer $OLOSTEP_API_KEY"
parser for structured JSONUse with "parser": {"id": "PARSER_ID"} and "formats": ["json"]:
| Parser ID | Use Case |
|---|---|
@olostep/google-search | Google SERP (organic, knowledge graph) |
@olostep/amazon-it-product | Amazon product pages |
@olostep/extract-emails | Email addresses from pages |
@olostep/extract-socials | Social media links |
$OLOSTEP_API_KEY is set before making requests.formats: ["markdown"] — most efficient for LLM context.result.markdown_content (not a top-level field).