Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Tavily Extract

v1.0.0

Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their...

1· 348·4 current·6 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
Purpose & Capability
The name/description (use Tavily extract API) matches the script's behavior: it sends requests to Tavily endpoints and tries to obtain a Tavily token. However the registry metadata declares no required env vars or binaries while the script actually expects/uses TAVILY_API_KEY (env), ~/.mcp-auth token files, and external tools (jq, curl, npx, base64, find, sed, grep). This undocumented dependency mismatch is a design/information-quality problem.
!
Instruction Scope
The SKILL.md documents the OAuth flow and mentions ~/.mcp-auth and ~/.claude/settings.json. The script will recursively search the user's ~/.mcp-auth for *_tokens.json and decode tokens, and if none present it launches an OAuth helper. Reading auth token files in the home directory is functionally related to obtaining a Tavily token, but it is sensitive (may expose presence of other cached tokens) and should have been explicitly declared. The instructions also suggest adding TAVILY_API_KEY to ~/.claude/settings.json which touches agent configuration files — the script itself does not write to those files but the guidance encourages modifying them.
!
Install Mechanism
There is no install spec, but the runtime script calls 'npx -y mcp-remote ...' to initiate OAuth. That will fetch and execute an npm package on demand. On-demand npm installs are a non-trivial risk vector (remote code executed without an explicit install step). The script also relies on system binaries (jq, curl, base64, find, sed, grep) that are not declared in the registry metadata.
!
Credentials
The registry declares no required env vars, yet the script uses TAVILY_API_KEY (env) if present, and otherwise searches local token caches. Access to ~/.mcp-auth is sensitive. While these items are explainable (they are used to authenticate to Tavily), the absence of explicit declared credentials/config requirements in metadata is a discrepancy and reduces transparency about what the skill will access.
Persistence & Privilege
The skill is not always-enabled, does not request elevated or persistent system presence, and does not modify other skills' configs. It does read files in the user's home directory and may launch a browser-based OAuth flow, which are expected for an auth-enabled client but should be documented.
What to consider before installing
This skill generally does what it says (calls Tavily to extract pages), but it has important gaps you should consider before installing: - It will try to read ~/.mcp-auth/*_tokens.json and extract access_token values; those files can contain other cached auth tokens — only proceed if you trust the skill and the Tavily issuer check in the script. - The script will run `npx -y mcp-remote ...` to perform OAuth if no token is found. That downloads and executes code from npm at runtime; review the mcp-remote package (or avoid letting the script run it) if you want to reduce risk. - The metadata didn't list runtime dependencies, but the script requires jq, curl, npx, and standard Unix tools; ensure those are present and safe. - For minimal exposure, consider creating and providing a dedicated TAVILY_API_KEY (set the env yourself) rather than allowing the script to scan ~/.mcp-auth or auto-run npx. If you need higher assurance, ask the skill author to: (1) declare required binaries/env in metadata, (2) avoid on-demand npx installs or document the exact npm package & version, and (3) limit or make optional any recursive scans of ~/, or at least clearly describe what files will be read.

Like a lobster shell, security has layers — review code before you run it.

latestvk971kjx7d1a99ayggq6ffvbkzd82mpys
348downloads
1stars
1versions
Updated 7h ago
v1.0.0
MIT-0

Extract Skill

Extract clean content from specific URLs. Ideal when you know which pages you want content from.

Authentication

The script uses OAuth via the Tavily MCP server. No manual setup required - on first run, it will:

  1. Check for existing tokens in ~/.mcp-auth/
  2. If none found, automatically open your browser for OAuth authentication

Note: You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. Sign up at tavily.com first if you don't have an account.

Alternative: API Key

If you prefer using an API key, get one at https://tavily.com and add to ~/.claude/settings.json:

{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
  }
}

Quick Start

Using the Script

./scripts/extract.sh '<json>'

Examples:

# Single URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'

# Multiple URLs
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'

# With query focus and chunks
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'

# Advanced extraction for JS pages
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'

Basic Extraction

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://example.com/article"]
  }'

Multiple URLs with Query Focus

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/ml-healthcare",
      "https://example.com/ai-diagnostics"
    ],
    "query": "AI diagnostic tools accuracy",
    "chunks_per_source": 3
  }'

API Reference

Endpoint

POST https://api.tavily.com/extract

Headers

HeaderValue
AuthorizationBearer <TAVILY_API_KEY>
Content-Typeapplication/json

Request Body

FieldTypeDefaultDescription
urlsarrayRequiredURLs to extract (max 20)
querystringnullReranks chunks by relevance
chunks_per_sourceinteger3Chunks per URL (1-5, requires query)
extract_depthstring"basic"basic or advanced (for JS pages)
formatstring"markdown"markdown or text
include_imagesbooleanfalseInclude image URLs
timeoutfloatvariesMax wait (1-60 seconds)

Response Format

{
  "results": [
    {
      "url": "https://example.com/article",
      "raw_content": "# Article Title\n\nContent..."
    }
  ],
  "failed_results": [],
  "response_time": 2.3
}

Extract Depth

DepthWhen to Use
basicSimple text extraction, faster
advancedDynamic/JS-rendered pages, tables, structured data

Examples

Single URL Extraction

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://docs.python.org/3/tutorial/classes.html"],
    "extract_depth": "basic"
  }'

Targeted Extraction with Query

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/react-hooks",
      "https://example.com/react-state"
    ],
    "query": "useState and useEffect patterns",
    "chunks_per_source": 2
  }'

JavaScript-Heavy Pages

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://app.example.com/dashboard"],
    "extract_depth": "advanced",
    "timeout": 60
  }'

Batch Extraction

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3",
      "https://example.com/page4",
      "https://example.com/page5"
    ],
    "extract_depth": "basic"
  }'

Tips

  • Max 20 URLs per request - batch larger lists
  • Use query + chunks_per_source to get only relevant content
  • Try basic first, fall back to advanced if content is missing
  • Set longer timeout for slow pages (up to 60s)
  • Check failed_results for URLs that couldn't be extracted

Comments

Loading comments...