Install
openclaw skills install liang-tavily-best-practicesBuild production-ready Tavily integrations with best practices baked in. Reference documentation for implementing web search, content extraction, crawling, and research in agentic workflows, RAG systems, or autonomous agents.
openclaw skills install liang-tavily-best-practicesTavily is a search API designed for LLMs, enabling AI applications to access real-time web data.
Python:
pip install tavily-python
JavaScript:
npm install @tavily/core
from tavily import TavilyClient
# Option 1: Uses TAVILY_API_KEY env var (recommended)
client = TavilyClient()
# Option 2: Explicit API key
client = TavilyClient(api_key="tvly-YOUR_API_KEY")
# Async client for parallel queries
from tavily import AsyncTavilyClient
async_client = AsyncTavilyClient()
For custom agents/workflows:
| Need | Method |
|---|---|
| Web search results | search() |
| Content from specific URLs | extract() |
| Content from entire site | crawl() |
| URL discovery from site | map() |
For out-of-the-box research:
| Need | Method |
|---|---|
| End-to-end research with AI synthesis | research() |
response = client.search(
query="quantum computing breakthroughs", # Keep under 400 chars
max_results=10,
search_depth="advanced", # highest relevance
topic="general" # or "news"
)
for result in response["results"]:
print(f"{result['title']}: {result['score']}")
Key parameters:
query - Keep under 400 charactersmax_results - 1-20search_depth - ultra-fast, fast, basic, advancedtopic - general or newsinclude_domains, exclude_domains - Filter sourcestime_range - day, week, month, year# Two-step pattern (recommended for control)
search_results = client.search(query="Python async best practices")
urls = [r["url"] for r in search_results["results"] if r["score"] > 0.5]
extracted = client.extract(
urls=urls[:20],
query="async patterns", # Reranks chunks by relevance
chunks_per_source=3 # Prevents context explosion
)
Key parameters:
urls - Max 20 URLsextract_depth - basic or advancedquery - Reranks chunks by relevancechunks_per_source - 1-5 (prevents context explosion)response = client.crawl(
url="https://docs.example.com",
max_depth=2,
instructions="Find API documentation pages", # Semantic focus
chunks_per_source=3, # Token optimization
select_paths=["/docs/.*", "/api/.*"]
)
Key parameters:
url - Root URL to crawlmax_depth - 1-5 (start with 1)max_breadth - Links per pagelimit - Total pages capinstructions - Natural language guidancechunks_per_source - 1-5 (for agentic use)select_paths, exclude_paths - Regex patternsresponse = client.map(
url="https://docs.example.com",
max_depth=2,
instructions="Find all API and guide pages"
)
api_docs = [url for url in response["results"] if "/api/" in url]
Use map() when you only need URLs, not content (faster than crawl).
import time
# For comprehensive multi-topic research
result = client.research(
input="Analyze competitive landscape for X in SMB market",
model="pro" # or "mini" for focused queries, "auto" when unsure
)
request_id = result["request_id"]
# Poll until completed
response = client.get_research(request_id)
while response["status"] not in ["completed", "failed"]:
time.sleep(10)
response = client.get_research(request_id)
print(response["content"]) # The research report
Key parameters:
input - Research topic or questionmodel - mini (quick), pro (comprehensive), autostream - Stream results as they arriveoutput_schema - Structured JSON outputcitation_format - Citation style| Depth | Latency | Relevance | Use Case |
|---|---|---|---|
ultra-fast | Lowest | Lower | Real-time chat, autocomplete |
fast | Low | Good | Need chunks but latency matters |
basic | Medium | High | General-purpose, balanced |
advanced | Higher | Highest | Precision matters, research |
Rule of thumb: Start with basic, escalate to advanced for complex topics.
Rule of thumb: "what does X do?" → mini. "X vs Y vs Z" or "best way to..." → pro.
| Model | Use Case | Speed |
|---|---|---|
mini | Single topic, targeted research | ~30s |
pro | Comprehensive multi-angle analysis | ~60-120s |
auto | API chooses based on complexity | Varies |
For agentic use (feeding results into context):
Always use instructions + chunks_per_source. This returns only relevant chunks instead of full pages, preventing context window explosion.
For data collection (saving to files):
Omit chunks_per_source to get full page content.
# Find relevant URLs first
search_results = client.search(query="React hooks documentation")
high_quality_urls = [r["url"] for r in search_results["results"] if r["score"] > 0.7]
# Extract content from best results
extracted = client.extract(
urls=high_quality_urls[:10],
query="useState and useEffect",
chunks_per_source=3
)
# Discover structure first
map_results = client.map(
url="https://docs.example.com",
max_depth=2,
instructions="Find API documentation pages"
)
# Crawl only relevant sections
api_urls = [url for url in map_results["results"] if "/api/" in url]
crawl_results = client.crawl(
url="https://docs.example.com/api",
max_depth=1,
limit=len(api_urls)
)
result = client.research(
input="Compare LangGraph vs CrewAI for multi-agent systems",
model="pro"
)
# The response includes citations
print(result["content"]) # AI-synthesized report
print(result["citations"]) # Source references
include_domains to focus on trusted sourcestime_range for recent informationmax_depth=1, limit=20)limit to prevent runaway crawlschunks_per_source for agentic workflows - prevents context explosionbasic depth as default (cheaper than advanced)max_results to what you'll actually useinclude_raw_content unless neededchunks_per_source instead of full content for contextfrom tavily import TavilyClient
from tavily.errors import TavilyError
client = TavilyClient()
try:
result = client.search(query="example")
except TavilyError as e:
print(f"Tavily API error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Tavily integrates with popular frameworks:
TavilySearch toolTavilySearch toolSee the official documentation for integration examples.