Exa Search

API key required
Other

Semantic search, web scraping, and content extraction optimized for AI Agents and LLMs. Use when you need highly relevant web search, clean Markdown extraction, extractive highlights, or similarity searches.

Install

openclaw skills install exa-search-ws

Exa Search Skill

This skill extends Manus's capabilities with a custom-built, semantic search engine optimized specifically for AI agents and LLMs. It enables neural search, clean Markdown extraction, extractive query highlights, and conceptual similarity searches.

When to Use

Use this skill when:

  1. You need to search the web using complex natural language prompts rather than raw keywords.
  2. You need clean, boilerplate-free Markdown text from web pages for LLM context windows.
  3. You need query-relevant extractive snippets (highlights) to reduce token consumption.
  4. You need to perform similarity searches using an existing URL as a conceptual query.
  5. You need to perform deep, multi-step web research or build structured lists using schema validation.

Core Capabilities & Commands

1. Neural Search (/search)

Query Exa's index using semantic embeddings. Unlike keyword matching, this understands the conceptual meaning of your prompt.

from exa_py import Exa
exa = Exa(api_key="YOUR_EXA_API_KEY")

results = exa.search(
    query="companies building innovative fusion energy reactors",
    type="auto",
    num_results=5,
    contents={"highlights": True}
)

2. Clean Web Extraction (/contents)

Retrieve webpage content stripped of navigation menus, sidebars, advertisements, and other boilerplate, returned as clean Markdown.

contents = exa.get_contents(
    urls=["https://example.com/target-article"],
    text=True,
    max_age_hours=24
)

3. Similar Link Discovery (/findSimilar)

Find conceptually similar pages in Exa's index using a starting URL as your query.

similar = exa.find_similar(
    url="https://arxiv.org/abs/2307.06435",
    num_results=5
)

Advanced Workflows & Best Practices

Cache Freshness & Live Crawling

By default, Exa serves cached pages to optimize speed. To control cache freshness, use max_age_hours instead of deprecated livecrawl parameters:

  • max_age_hours=0: Forces a live crawl of the URL.
  • max_age_hours=1: Uses cache if it's less than 1 hour old, otherwise performs a live crawl.
  • max_age_hours=-1: Cache-only lookup (never crawl).

Subpage Crawling

Automatically discover and extract content from linked subpages on a target site. Highly effective for documentation or news archives:

results = exa.get_contents(
    ["https://docs.exa.ai"],
    subpages=10,
    subpage_target=["api", "reference"],
    max_age_hours=24
)

RAG Integration Pattern

Always format extracted contents cleanly into XML blocks for downstream LLM generation:

context = "\n".join([
    f"<source><url>{r.url}</url><highlights>{r.highlights}</highlights></source>"
    for r in results.results
])

References & Resources

  • Detailed API endpoints and SDK configurations: API Reference
  • Command-line search utility: Execute /home/ubuntu/skills/exa-search/scripts/exa_search.py --help