Olostep

v1.0.3

Scrape webpages, search Google, crawl sites, batch-scrape up to 10k URLs, map site structure, and get AI-powered answers with citations using the Olostep Web...

0· 158·0 current·0 all-time
byZeeshan Adil@zeeshanadilbutt

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for zeeshanadilbutt/olostep.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Olostep" (zeeshanadilbutt/olostep) from ClawHub.
Skill page: https://clawhub.ai/zeeshanadilbutt/olostep
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install olostep

ClawHub CLI

Package manager switcher

npx clawhub@latest install olostep
Security findings were reviewed by staff and cleared for public use.
latestvk9721m2fbzzb8ee05rswv4m5yd84g0a7
158downloads
0stars
4versions
Updated 1w ago
v1.0.3
MIT-0

Olostep — Web Data API for AI Agents

Fetch live web content via the Olostep API. Covers scraping, searching, crawling, batch processing, site mapping, AI-powered answers, and structured data extraction.

Authentication: Every request needs Authorization: Bearer $OLOSTEP_API_KEY. If the env var is missing, stop and tell the user to set it. Get a free key (500 req/month) at https://olostep.com/auth.

Base URL: https://api.olostep.com/v1


1. Scrape a Single Page

Extract content from any URL as markdown, HTML, JSON, or text. Handles JavaScript rendering and anti-bot protections automatically.

curl -sS -X POST "https://api.olostep.com/v1/scrapes" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url_to_scrape": "https://example.com/page",
    "formats": ["markdown"]
  }'

Response: Content is in result.markdown_content (or result.html_content, result.text_content, result.json_content depending on requested formats).

ParameterRequiredDefaultDescription
url_to_scrapeYesURL to scrape
formatsYesArray: markdown, html, text, json, screenshot
countryNoCountry code for geo-targeted scraping (US, GB, IN)
wait_before_scrapingNo0Milliseconds to wait for JS rendering (0–10000)
parserNoParser object {"id": "@olostep/google-search"} for structured JSON
llm_extractNoObject with schema for LLM-based extraction

When to use: Single page extraction — docs, articles, product pages, profiles.

Tips:

  • Default to formats: ["markdown"] — most token-efficient for LLM processing
  • For JavaScript-heavy SPAs, set wait_before_scraping: 2000
  • Use parsers for structured JSON from known sites (see Parsers section)

2. Search Google

Search Google by scraping a Google URL with the @olostep/google-search parser. No separate search endpoint — search goes through /v1/scrapes.

curl -sS -X POST "https://api.olostep.com/v1/scrapes" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url_to_scrape": "https://www.google.com/search?q=best+AI+coding+tools+2026&gl=us",
    "formats": ["json"],
    "parser": {"id": "@olostep/google-search"}
  }'

Response: result.json_content is a stringified JSON string. Parse it to get organic (array of {title, link, snippet}), knowledgeGraph, peopleAlsoAsk, relatedSearches.

How to build the Google URL:

  • Base: https://www.google.com/search?q=YOUR+QUERY
  • Add &gl=us for country (ISO codes: us, gb, de, in)
  • URL-encode the query (spaces become +)

When to use: Research, finding docs, competitive analysis, debugging errors.


3. Crawl a Website

Async crawl that discovers and scrapes pages by following links. Poll for results.

# Start crawl
curl -sS -X POST "https://api.olostep.com/v1/crawls" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "start_url": "https://docs.example.com",
    "max_pages": 10
  }'
# Check status (poll until status is "completed")
curl -sS "https://api.olostep.com/v1/crawls/CRAWL_ID" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY"
# Get pages (once completed)
curl -sS "https://api.olostep.com/v1/crawls/CRAWL_ID/pages?limit=10" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY"

Pages return retrieve_id per page. Use /v1/retrieve?retrieve_id=ID&formats=markdown to get content.

ParameterRequiredDefaultDescription
start_urlYesStarting URL
max_pagesYesMaximum pages to crawl
include_urlsNo["/**"]Glob patterns to include (["/blog/**"])
exclude_urlsNoGlob patterns to exclude (["/admin/**"])
max_depthNoMaximum link depth from start URL

When to use: Ingesting docs sites, blog archives, product catalogs.


4. Batch Scrape URLs

Scrape up to 10,000 URLs in one parallel batch. Async — poll for results.

curl -sS -X POST "https://api.olostep.com/v1/batches" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {"url": "https://example.com/page1", "custom_id": "page1"},
      {"url": "https://example.com/page2", "custom_id": "page2"}
    ]
  }'
# Check status
curl -sS "https://api.olostep.com/v1/batches/BATCH_ID" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY"
# Get results (once completed)
curl -sS "https://api.olostep.com/v1/batches/BATCH_ID/items?limit=20" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY"

Items return retrieve_id. Use /v1/retrieve?retrieve_id=ID&formats=markdown for content.

When to use: Large-scale extraction — product pages, directories, documentation sets.


5. Map a Website

Discover all URLs on a site without scraping content. Synchronous — returns immediately.

curl -sS -X POST "https://api.olostep.com/v1/maps" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "include_urls": ["/blog/**"],
    "top_n": 50
  }'

Response: urls array of discovered URLs, urls_count total.

ParameterRequiredDefaultDescription
urlYesWebsite to map
search_queryNoSort URLs by relevance
top_nNoLimit number of URLs
include_urlsNoGlob patterns to include
exclude_urlsNoGlob patterns to exclude

When to use: Site analysis, content auditing, planning before crawl/batch.


6. AI-Powered Answers

Web-sourced answers with citations. Optionally provide JSON schema for structured output. Synchronous.

curl -sS -X POST "https://api.olostep.com/v1/answers" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "What are the top 5 AI agent frameworks in 2026?"
  }'

With structured output:

curl -sS -X POST "https://api.olostep.com/v1/answers" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Find the founders and funding of Olostep",
    "json_format": {"company": "", "founders": [], "total_funding": "", "last_round": ""}
  }'

Response: result.json_content matches your schema. result.sources lists URLs used.

When to use: Research, fact-checking, competitive analysis, structured web intelligence.


7. Retrieve Content by ID

Crawl and batch results return retrieve_id per item. Get actual content with:

curl -sS "https://api.olostep.com/v1/retrieve?retrieve_id=RETRIEVE_ID&formats=markdown" \
  -H "Authorization: Bearer $OLOSTEP_API_KEY"

Common Workflows

Research a topic

  1. Search Google → find sources
  2. Scrape top results → get full content
  3. Synthesize into deliverable

Ingest documentation

  1. Map the docs site → discover URLs
  2. Batch or Crawl relevant sections
  3. Retrieve content by ID

Debug an error

  1. Search the exact error message (in quotes)
  2. Scrape GitHub issues or Stack Overflow answers
  3. Apply the fix

Extract structured data at scale

  1. Map to find all product/listing URLs
  2. Batch with parser for structured JSON
  3. Retrieve and process results

Available Parsers

Use with "parser": {"id": "PARSER_ID"} and "formats": ["json"]:

Parser IDUse Case
@olostep/google-searchGoogle SERP (organic, knowledge graph)
@olostep/amazon-it-productAmazon product pages
@olostep/extract-emailsEmail addresses from pages
@olostep/extract-socialsSocial media links

Rules

  • Always check $OLOSTEP_API_KEY is set before making requests.
  • Default to formats: ["markdown"] — most efficient for LLM context.
  • Content is inside result.markdown_content (not a top-level field).
  • Crawls and batches are async — poll status before fetching results.
  • Only fetch what the current task needs. Do not scrape unnecessarily.

Comments

Loading comments...