Anycrawl

v1.0.0

Provides web scraping, crawling, and Google search with multi-engine support via the SkillBoss API for structured data extraction and automation.

⭐ 0· 48·0 current·0 all-time

by@quincygunter

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for quincygunter/qui-anycrawl.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Anycrawl" (quincygunter/qui-anycrawl) from ClawHub.
Skill page: https://clawhub.ai/quincygunter/qui-anycrawl
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install qui-anycrawl

ClawHub CLI

Package manager switcher

npx clawhub@latest install qui-anycrawl

Security Scan

Capability signals

CryptoCan make purchasesRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (web scraping, crawling, multi-engine search) match the implementation: index.js calls an external API (https://api.heybossai.com/v1/pilot) to perform scraping, crawling, and search. The skill's declared multi-engine support is plausible as the API likely proxies different engines. However, the registry metadata claims no required env vars/credentials while both SKILL.md and index.js explicitly require SKILLBOSS_API_KEY — this metadata omission is an inconsistency.

✓

Instruction Scope

SKILL.md instructs the agent to set SKILLBOSS_API_KEY and shows functions limited to search/scrape/crawl actions. The runtime instructions and code only read the SKILLBOSS_API_KEY environment variable and call the external API; they do not instruct reading arbitrary local files, shell history, credentials, or transmitting data to other endpoints beyond the documented api.heybossai.com.

✓

Install Mechanism

There is no install specification (instruction-only installation), which is low risk for filesystem writes. Note: a code file (index.js) is included and is what will run; no install step downloads additional binaries or archives from unknown hosts.

Credentials

The skill requires a single API credential (SKILLBOSS_API_KEY) which is proportionate to the claimed functionality. The problem: registry metadata lists no required env vars or primary credential, while both SKILL.md and index.js demand SKILLBOSS_API_KEY and the code throws if it is missing. That mismatch could be a publishing/metadata error or deliberate omission — either way you should not trust the registry metadata alone when granting credentials.

✓

Persistence & Privilege

The skill is not marked always:true, does not request persistent platform privileges, and does not attempt to modify other skills or system-wide configuration in its code or instructions.

What to consider before installing

This skill appears to implement what it claims: it proxies scraping/search tasks to https://api.heybossai.com and requires SKILLBOSS_API_KEY. However, the registry metadata fails to list that required credential — treat that as a red flag. Before installing: verify the publisher and the external API (heybossai.com) are trustworthy; confirm why the registry omitted SKILLBOSS_API_KEY; avoid using high-privilege or reusable keys (create a scoped/test key if possible); review legal/TOS implications of scraping target sites; and consider running the skill in a restricted/sandboxed environment first. If you cannot verify the publisher or the API provider, do not supply your API key.

✗

index.js:4

Environment variable access combined with network send.

Patterns worth reviewing

These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

latestvk97crfx8mm7dky0mqfa86ckr85851m56

48downloads

0stars

1versions

Updated 1w ago

v1.0.0

MIT-0

AnyCrawl Skill

Web scraping, crawling, and search via SkillBoss API Hub — powered by high-performance multi-threaded crawling backends (Firecrawl, Jina AI, Apify, ScrapingDog, and more).

Setup

Method 1: Environment variable (Recommended)

export SKILLBOSS_API_KEY="your-api-key"

Make it permanent by adding to ~/.bashrc or ~/.zshrc:

echo 'export SKILLBOSS_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc

Get your API key at: https://heybossai.com

Method 2: OpenClaw gateway config

openclaw config.patch --set SKILLBOSS_API_KEY="your-api-key"

requires.env

SKILLBOSS_API_KEY

Functions

1. anycrawl_scrape

Scrape a single URL and convert to LLM-ready structured data via SkillBoss API Hub (type: "scraping").

Parameters:

url (string, required): URL to scrape
engine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer"
formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"]
timeout (number, optional): Timeout in milliseconds (default: 30000)
wait_for (number, optional): Delay before extraction in ms (browser engines only)
wait_for_selector (string/object/array, optional): Wait for CSS selectors
include_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"])
exclude_tags (array, optional): Exclude these HTML tags
proxy (string, optional): Proxy URL (e.g., "http://proxy:port")
json_options (object, optional): JSON extraction with schema/prompt
extract_source (string, optional): "markdown" (default) or "html"

Examples:

// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })

// Scrape SPA with Playwright
anycrawl_scrape({
  url: "https://spa-example.com",
  engine: "playwright",
  formats: ["markdown", "screenshot"]
})

// Extract structured JSON
anycrawl_scrape({
  url: "https://product-page.com",
  engine: "cheerio",
  json_options: {
    schema: {
      type: "object",
      properties: {
        product_name: { type: "string" },
        price: { type: "number" },
        description: { type: "string" }
      },
      required: ["product_name", "price"]
    },
    user_prompt: "Extract product details from this page"
  }
})

2. anycrawl_search

Search Google and return structured results via SkillBoss API Hub (type: "search").

Parameters:

query (string, required): Search query
engine (string, optional): Search engine - "google" (default)
limit (number, optional): Max results per page (default: 10)
offset (number, optional): Number of results to skip (default: 0)
pages (number, optional): Number of pages to retrieve (default: 1, max: 20)
lang (string, optional): Language locale (e.g., "en", "zh", "vi")
safe_search (number, optional): 0 (off), 1 (medium), 2 (high)
scrape_options (object, optional): Scrape each result URL with these options

Examples:

// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })

// Multi-page search in Vietnamese
anycrawl_search({
  query: "hướng dẫn Node.js",
  pages: 3,
  lang: "vi"
})

// Search and auto-scrape results
anycrawl_search({
  query: "best AI tools 2026",
  limit: 5,
  scrape_options: {
    engine: "cheerio",
    formats: ["markdown"]
  }
})

3. anycrawl_crawl_start

Start crawling an entire website (async job) via SkillBoss API Hub (type: "scraping" with crawl mode).

Parameters:

url (string, required): Seed URL to start crawling
engine (string, optional): "cheerio" (default), "playwright", "puppeteer"
strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin"
max_depth (number, optional): Max depth from seed URL (default: 10)
limit (number, optional): Max pages to crawl (default: 100)
include_paths (array, optional): Path patterns to include (e.g., ["/blog/*"])
exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/*"])
scrape_paths (array, optional): Only scrape URLs matching these patterns
scrape_options (object, optional): Per-page scrape options

Examples:

// Crawl entire website
anycrawl_crawl_start({
  url: "https://docs.example.com",
  engine: "cheerio",
  max_depth: 5,
  limit: 50
})

// Crawl only blog posts
anycrawl_crawl_start({
  url: "https://example.com",
  strategy: "same-domain",
  include_paths: ["/blog/*"],
  exclude_paths: ["/blog/tags/*"],
  scrape_options: {
    formats: ["markdown"]
  }
})

// Crawl product pages only
anycrawl_crawl_start({
  url: "https://shop.example.com",
  strategy: "same-domain",
  scrape_paths: ["/products/*"],
  limit: 200
})

4. anycrawl_crawl_status

Check crawl job status.

Parameters:

job_id (string, required): Crawl job ID

Example:

anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })

5. anycrawl_crawl_results

Get crawl results (paginated).

Parameters:

job_id (string, required): Crawl job ID
skip (number, optional): Number of results to skip (default: 0)

Example:

// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })

// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })

6. anycrawl_crawl_cancel

Cancel a running crawl job.

Parameters:

job_id (string, required): Crawl job ID

7. anycrawl_search_and_scrape

Quick helper: Search Google then scrape top results.

Parameters:

query (string, required): Search query
max_results (number, optional): Max results to scrape (default: 3)
scrape_engine (string, optional): Engine for scraping (default: "cheerio")
formats (array, optional): Output formats (default: ["markdown"])
lang (string, optional): Search language

Example:

anycrawl_search_and_scrape({
  query: "latest AI news",
  max_results: 5,
  formats: ["markdown"]
})

Engine Selection Guide

Engine	Best For	Speed	JS Rendering
`cheerio`	Static HTML, news, blogs	Fastest	No
`playwright`	SPAs, complex web apps	Slower	Yes
`puppeteer`	Chrome-specific, metrics	Slower	Yes

Response Format

All scraping responses follow this structure (result path: data.result):

{
  "data": {
    "result": { ... }
  }
}

All search responses follow this structure (result path: data.result):

{
  "data": {
    "result": [ ... ]
  }
}

Error response:

{
  "success": false,
  "error": "Error type",
  "message": "Human-readable message"
}

Common Error Codes

400 - Bad Request (validation errors)
401 - Unauthorized (invalid API key)
402 - Payment Required (insufficient credits)
404 - Not Found
429 - Rate limit exceeded
500 - Internal server error

API Limits

Rate limits apply based on your plan
Crawl jobs expire after 24 hours
Max crawl limit: depends on credits

Comments

Loading comments...

Anycrawl

Install

Install with OpenClaw

CLI Commands

AnyCrawl Skill

Setup

Method 1: Environment variable (Recommended)

Method 2: OpenClaw gateway config

requires.env

Functions

1. anycrawl_scrape

2. anycrawl_search

3. anycrawl_crawl_start

4. anycrawl_crawl_status

5. anycrawl_crawl_results

6. anycrawl_crawl_cancel

7. anycrawl_search_and_scrape

Engine Selection Guide

Response Format

Common Error Codes

API Limits

Links

Comments