Tavily Extract
v1.0.0Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their...
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The name/description (use Tavily extract API) matches the script's behavior: it sends requests to Tavily endpoints and tries to obtain a Tavily token. However the registry metadata declares no required env vars or binaries while the script actually expects/uses TAVILY_API_KEY (env), ~/.mcp-auth token files, and external tools (jq, curl, npx, base64, find, sed, grep). This undocumented dependency mismatch is a design/information-quality problem.
Instruction Scope
The SKILL.md documents the OAuth flow and mentions ~/.mcp-auth and ~/.claude/settings.json. The script will recursively search the user's ~/.mcp-auth for *_tokens.json and decode tokens, and if none present it launches an OAuth helper. Reading auth token files in the home directory is functionally related to obtaining a Tavily token, but it is sensitive (may expose presence of other cached tokens) and should have been explicitly declared. The instructions also suggest adding TAVILY_API_KEY to ~/.claude/settings.json which touches agent configuration files — the script itself does not write to those files but the guidance encourages modifying them.
Install Mechanism
There is no install spec, but the runtime script calls 'npx -y mcp-remote ...' to initiate OAuth. That will fetch and execute an npm package on demand. On-demand npm installs are a non-trivial risk vector (remote code executed without an explicit install step). The script also relies on system binaries (jq, curl, base64, find, sed, grep) that are not declared in the registry metadata.
Credentials
The registry declares no required env vars, yet the script uses TAVILY_API_KEY (env) if present, and otherwise searches local token caches. Access to ~/.mcp-auth is sensitive. While these items are explainable (they are used to authenticate to Tavily), the absence of explicit declared credentials/config requirements in metadata is a discrepancy and reduces transparency about what the skill will access.
Persistence & Privilege
The skill is not always-enabled, does not request elevated or persistent system presence, and does not modify other skills' configs. It does read files in the user's home directory and may launch a browser-based OAuth flow, which are expected for an auth-enabled client but should be documented.
What to consider before installing
This skill generally does what it says (calls Tavily to extract pages), but it has important gaps you should consider before installing:
- It will try to read ~/.mcp-auth/*_tokens.json and extract access_token values; those files can contain other cached auth tokens — only proceed if you trust the skill and the Tavily issuer check in the script.
- The script will run `npx -y mcp-remote ...` to perform OAuth if no token is found. That downloads and executes code from npm at runtime; review the mcp-remote package (or avoid letting the script run it) if you want to reduce risk.
- The metadata didn't list runtime dependencies, but the script requires jq, curl, npx, and standard Unix tools; ensure those are present and safe.
- For minimal exposure, consider creating and providing a dedicated TAVILY_API_KEY (set the env yourself) rather than allowing the script to scan ~/.mcp-auth or auto-run npx.
If you need higher assurance, ask the skill author to: (1) declare required binaries/env in metadata, (2) avoid on-demand npx installs or document the exact npm package & version, and (3) limit or make optional any recursive scans of ~/, or at least clearly describe what files will be read.Like a lobster shell, security has layers — review code before you run it.
latest
Extract Skill
Extract clean content from specific URLs. Ideal when you know which pages you want content from.
Authentication
The script uses OAuth via the Tavily MCP server. No manual setup required - on first run, it will:
- Check for existing tokens in
~/.mcp-auth/ - If none found, automatically open your browser for OAuth authentication
Note: You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. Sign up at tavily.com first if you don't have an account.
Alternative: API Key
If you prefer using an API key, get one at https://tavily.com and add to ~/.claude/settings.json:
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}
Quick Start
Using the Script
./scripts/extract.sh '<json>'
Examples:
# Single URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
# Multiple URLs
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
# With query focus and chunks
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
# Advanced extraction for JS pages
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
Basic Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com/article"]
}'
Multiple URLs with Query Focus
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/ml-healthcare",
"https://example.com/ai-diagnostics"
],
"query": "AI diagnostic tools accuracy",
"chunks_per_source": 3
}'
API Reference
Endpoint
POST https://api.tavily.com/extract
Headers
| Header | Value |
|---|---|
Authorization | Bearer <TAVILY_API_KEY> |
Content-Type | application/json |
Request Body
| Field | Type | Default | Description |
|---|---|---|---|
urls | array | Required | URLs to extract (max 20) |
query | string | null | Reranks chunks by relevance |
chunks_per_source | integer | 3 | Chunks per URL (1-5, requires query) |
extract_depth | string | "basic" | basic or advanced (for JS pages) |
format | string | "markdown" | markdown or text |
include_images | boolean | false | Include image URLs |
timeout | float | varies | Max wait (1-60 seconds) |
Response Format
{
"results": [
{
"url": "https://example.com/article",
"raw_content": "# Article Title\n\nContent..."
}
],
"failed_results": [],
"response_time": 2.3
}
Extract Depth
| Depth | When to Use |
|---|---|
basic | Simple text extraction, faster |
advanced | Dynamic/JS-rendered pages, tables, structured data |
Examples
Single URL Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://docs.python.org/3/tutorial/classes.html"],
"extract_depth": "basic"
}'
Targeted Extraction with Query
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/react-hooks",
"https://example.com/react-state"
],
"query": "useState and useEffect patterns",
"chunks_per_source": 2
}'
JavaScript-Heavy Pages
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://app.example.com/dashboard"],
"extract_depth": "advanced",
"timeout": 60
}'
Batch Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
"https://example.com/page4",
"https://example.com/page5"
],
"extract_depth": "basic"
}'
Tips
- Max 20 URLs per request - batch larger lists
- Use
query+chunks_per_sourceto get only relevant content - Try
basicfirst, fall back toadvancedif content is missing - Set longer
timeoutfor slow pages (up to 60s) - Check
failed_resultsfor URLs that couldn't be extracted
Comments
Loading comments...
