Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

anydocs - Generic Documentation Indexing & Search

v1.0.2

Generic Documentation Indexing & Search. Index any documentation site (SPA/static) and search it instantly.

0· 1.5k·1 current·1 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
Purpose & Capability
Name/description match the code: this is a documentation indexing/search tool that scrapes sites, caches pages, and builds a local search index. The included modules (scraper, indexer, cache, config) are proportionate to that purpose.
!
Instruction Scope
SKILL.md and README claim strict protections for browser rendering (HTTPS-only, no arbitrary URL injection). The code does not fully enforce those claims: (1) fetch command accepts arbitrary full URLs (if path starts with http) allowing fetching pages outside a configured profile; (2) sitemap parsing will include any <loc> entries without verifying they match base_url before scheduling scraping; (3) the HTTPS restriction for browser rendering is only enforced when use_browser AND gateway_token are both provided — local Playwright usage (or gateway usage without token) can render HTTP URLs. These gaps make it possible to direct the browser tool (or HTTP fetches) at URLs outside the intended profile, and could expose an OpenClaw gateway token to a remote gateway if gateway_url is set to a non-local host.
Install Mechanism
No remote download/install spec is present; dependencies are local and pinned in requirements.txt. Optional Playwright usage may require running 'playwright install' (downloads browser binaries) but that's an explicit, local action. setup.sh encourages venv usage. No alarming remote URLs or extract/install-from-URL steps were found.
Credentials
The skill optionally reads OPENCLAW_GATEWAY_TOKEN from the environment to authorize requests to an OpenClaw gateway for browser rendering. That is reasonable for its claimed gateway integration, but the manifest/metadata did not declare required env vars and SKILL.md assumes users will provide the token. Because the gateway token is a powerful secret, the code paths that post it to gateway_url (which can be overridden via CLI) should be treated carefully—sending a token to a remote gateway_url would leak it. No other unrelated credentials are requested.
Persistence & Privilege
The skill does not request always:true, does not alter other skills, and stores config/cache under ~/.anydocs which is consistent with its function. It requires normal user-level filesystem access to store profiles and caches.
Scan Findings in Context
[no_findings] expected: Static pre-scan injection signals: none detected. That does not remove the runtime concerns identified by reading the code (URL validation and gateway token usage).
What to consider before installing
This skill largely does what it claims, but review these important points before installing or supplying secrets: - Gateway token safety: Only provide OPENCLAW_GATEWAY_TOKEN if you trust the gateway endpoint. Prefer running a local gateway (default http://127.0.0.1:18789). Do not set gateway_url to an untrusted remote host while supplying a token — that will forward your token to that host. - Verify site scope: The code allows fetching an arbitrary full URL via the 'fetch' command and will include any <loc> entries from sitemaps without strict domain filtering. If you will index internal or sensitive documentation, make sure profiles' sitemap/base_url entries are correct and review discovered URLs before enabling browser rendering. - HTTPS and browser rendering: The README claims browser rendering rejects HTTP, but the code only enforces HTTPS in some code paths. If you need to prevent rendering of HTTP hosts, add an explicit validation or avoid use_browser/playwright for HTTP targets. - Use a virtual environment: setup.sh recommends and enables a venv — follow that to avoid system-wide package changes. - Minimize token exposure: Load gateway tokens from environment variables (not command-line args or version control), and remove tokens from environment when not needed. - If you need higher assurance, request the missing code changes: (1) enforce domain checks on sitemap URLs, (2) forbid fetch of full external URLs unless explicitly allowed, and (3) enforce HTTPS for all browser-rendering paths or document the exact exceptions. Given these inconsistencies, treat the skill as suspicious (likely well-intentioned but with security oversights) and remediate the above before using with sensitive credentials or internal docs.

Like a lobster shell, security has layers — review code before you run it.

api-docsvk97d6wd1mvgbcf6pxpgje1n1xd80sfg5documentationvk97d6wd1mvgbcf6pxpgje1n1xd80sfg5indexingvk97d6wd1mvgbcf6pxpgje1n1xd80sfg5latestvk976g9vzy0px7th123rhqnpahh80x5y8searchvk97d6wd1mvgbcf6pxpgje1n1xd80sfg5securityvk97d6wd1mvgbcf6pxpgje1n1xd80sfg5spavk97d6wd1mvgbcf6pxpgje1n1xd80sfg5web-scrapingvk97d6wd1mvgbcf6pxpgje1n1xd80sfg5
1.5kdownloads
0stars
3versions
Updated 2h ago
v1.0.2
MIT-0

anydocs - Generic Documentation Indexing & Search

A powerful, reusable skill for indexing and searching ANY documentation site.

What It Does

anydocs solves a real problem: accessing documentation from code or CLI. Instead of opening a browser every time, you can:

  • Index any documentation site (Discord, OpenClaw, internal docs, etc.)
  • Search instantly from the command line or Python API
  • Cache pages locally to avoid repeated network calls
  • Configure multiple profiles for different doc sites

When to Use It

Use anydocs when you need to:

  • Quickly look up API documentation without leaving the terminal
  • Build agents that need to reference docs
  • Extract specific information from documentation
  • Search across multiple documentation sites
  • Integrate docs into your workflow

Key Features

🔍 Multi-Method Search

  • Keyword search: Fast, term-based matching with BM25-style scoring
  • Hybrid search: Keyword + phrase proximity for better relevance
  • Regex search: Advanced pattern matching for power users

🌐 Works with Any Docs Site

  • Sitemap-based discovery (standard XML sitemap)
  • Fallback crawling from base URL
  • HTML content extraction with smart selector detection
  • Automatic rate limiting to be respectful

💾 Smart Caching

  • Pages cached locally with 7-day TTL (configurable)
  • Search indexes cached for instant second searches
  • Cache statistics and cleanup commands
  • Respects cache invalidation

⚙️ Profile-Based Configuration

  • Support multiple doc sites simultaneously
  • Per-profile search methods and cache TTLs
  • Configuration stored in ~/.anydocs/config.json
  • Examples for Discord, OpenClaw, and custom sites

🌐 JavaScript Rendering (Optional)

  • Uses Playwright to render client-side SPAs (Single Page Apps)
  • Automatically discovers links on JS-heavy sites like Discord docs
  • Gracefully falls back to standard HTTP if Playwright unavailable
  • Configure per-discovery session or globally per profile

Installation

cd /path/to/skills/anydocs
pip install -r requirements.txt
chmod +x anydocs.py

Optional: Browser-based rendering (for JavaScript-heavy sites)

For sites like Discord that use client-side rendering, install Playwright:

pip install playwright==1.40.0
playwright install  # Downloads Chromium

If Playwright is unavailable, anydocs gracefully falls back to standard HTTP fetching.

Quick Start

1. Configure a Documentation Site

python anydocs.py config vuejs \
  https://vuejs.org \
  https://vuejs.org/sitemap.xml

2. Build the Index

python anydocs.py index vuejs

This discovers all pages via sitemap, scrapes content, and builds a searchable index.

3. Search

python anydocs.py search "composition api" --profile vuejs
python anydocs.py search "reactivity" --profile vuejs --limit 5

4. Fetch a Specific Page

python anydocs.py fetch "guide/introduction" --profile vuejs

CLI Commands

Configuration

# Add or update a profile
anydocs config <profile> <base_url> <sitemap_url> [--search-method hybrid] [--ttl-days 7]

# List configured profiles
anydocs list-profiles

Indexing

# Build index for a profile
anydocs index <profile>

# Force re-index (skip cache)
anydocs index <profile> --force

Search

# Basic keyword search
anydocs search "query" --profile discord

# Limit results
anydocs search "query" --profile discord --limit 5

# Regex search
anydocs search "^API" --profile discord --regex

Fetch

# Fetch a specific page (URL or path)
anydocs fetch "https://discord.com/developers/docs/resources/webhook"
anydocs fetch "resources/webhook" --profile discord

Cache Management

# Show cache statistics
anydocs cache status

# Clear all cache
anydocs cache clear

# Clear specific profile's cache
anydocs cache clear --profile discord

Python API

For use in agents and scripts:

from lib.config import ConfigManager
from lib.scraper import DiscoveryEngine
from lib.indexer import SearchIndex

# Load configuration
config_mgr = ConfigManager()
config = config_mgr.get_profile("discord")

# Scrape documentation
scraper = DiscoveryEngine(config["base_url"], config["sitemap_url"])
pages = scraper.fetch_all()

# Build search index
index = SearchIndex()
index.build(pages)

# Search
results = index.search("webhooks", limit=10)
for result in results:
    print(f"{result['title']} ({result['relevance_score']})")
    print(f"  {result['url']}")

Configuration File Format

Configuration is stored in ~/.anydocs/config.json:

{
  "discord": {
    "name": "discord",
    "base_url": "https://discord.com/developers/docs",
    "sitemap_url": "https://discord.com/developers/docs/sitemap.xml",
    "search_method": "hybrid",
    "cache_ttl_days": 7
  },
  "openclaw": {
    "name": "openclaw",
    "base_url": "https://docs.openclaw.ai",
    "sitemap_url": "https://docs.openclaw.ai/sitemap.xml",
    "search_method": "hybrid",
    "cache_ttl_days": 7
  }
}

Search Methods

Keyword Search

  • Speed: Fast
  • Best for: Common terms, exact matches
  • How it works: Term matching with position weighting (title > tags > content)
  • Example: anydocs search "webhooks"

Hybrid Search (Default)

  • Speed: Fast
  • Best for: Natural language queries
  • How it works: Keyword search + phrase proximity scoring
  • Example: anydocs search "how to set up webhooks"

Regex Search

  • Speed: Medium
  • Best for: Complex patterns
  • How it works: Compiled regex pattern matching across all content
  • Example: anydocs search "^(GET|POST)" --regex

Caching Behavior

  • Pages: Cached as JSON with 7-day TTL (configurable)
  • Indexes: Cached after indexing, invalidated on TTL expiry
  • Cache location: ~/.anydocs/cache/
  • Manual refresh: Use --force flag or clear cache

Performance Notes

  • First index build takes 2-10 minutes depending on site size
  • Subsequent searches are instant (cached indexes)
  • Rate limit: 0.5s per page to be respectful
  • Typical search returns ~100 results in <100ms

Troubleshooting

"No index for 'profile'" error

Run anydocs index <profile> first to build the index.

Sitemap not found

Check the sitemap URL. Falls back to crawling from base_url if unavailable.

Slow indexing

This is normal for large sites. Rate limiting prevents overwhelming servers.

Cache grows too large

Run anydocs cache clear or set --ttl-days to a smaller value.

Examples

Vue.js Framework Docs (SPA Example)

anydocs config vuejs \
  https://vuejs.org \
  https://vuejs.org/sitemap.xml
anydocs index vuejs
anydocs search "composition api"

Next.js API Docs

anydocs config nextjs \
  https://nextjs.org \
  https://nextjs.org/sitemap.xml
anydocs index nextjs
anydocs search "app router" --profile nextjs

Internal Company Documentation

anydocs config internal \
  https://docs.company.local \
  https://docs.company.local/sitemap.xml
anydocs index internal --force
anydocs search "deployment" --profile internal

Architecture

  • scraper.py: Discovers URLs via sitemap, fetches and parses HTML
  • indexer.py: Builds searchable indexes, implements multiple search strategies
  • config.py: Manages configuration profiles
  • cache.py: TTL-based file caching for pages and indexes
  • cli.py: Click-based command-line interface

Contributing

To add new documentation sites, run:

anydocs config <profile> <base_url> <sitemap_url>

To extend search functionality, modify lib/indexer.py.

License

Part of the OpenClaw system.

Comments

Loading comments...